SlideShare a Scribd company logo
1 of 37
Artem Aliev
Bring Your Own Spark
with Enterprise Security
1 DSE BYOS Overview
2 BYOS Configuration Tools
3 Use Cases
4 BYOS vs OSS Spark Connector
5 Kerberos Demo
2© DataStax, All Rights Reserved.
Connect Your Spark to DSE
© DataStax, All Rights Reserved. 3
HDFS
Hive
Meta
Store
ClusterManger
Spark
SQL
DSE C*
Hive
Meta
Store
CFS
DSE Spark
SQL
Connect Your Spark to DSE
© DataStax, All Rights Reserved. 4
HDFS
Hive
Meta
Store
ClusterManger
Spark
SQL
Hive
Meta
Store
CFS
DSE Spark
SQL
DSE C*
Bring Your Own Spark!
• A simple way to
– Read Cassandra and CFS data from external Spark
– Export necessary configuration info to connect to DSE
• Includes security options
– Export necessary Jars to connect
– Attach these exported resource to a spark-submit
• Also
– Simple way to get the SparkSQL syntax to create catalog entries for
tables in Cassandra
– Read external HDFS data from DSE Spark jobs
© DataStax, All Rights Reserved. 5
BYOS Components
• BYOS assembly jar (add it to spark jars)
• spark-cassanda-connector, secure transport, CFS and dependencies
$DSE_HOME/clients/dse-byos_2.10-5.0.2-SNAPSHOT.jar
• Spark configuration generator (merge result with spark-defaults.conf)
• Contains Cassandra host, auth type and factories
dse client-tool configuration byos-export byos.conf
• Spark-SQL Schema mapping generator (run result by spark-sql)
• The sql script will create databases and table mapping for all C* tables
© DataStax, All Rights Reserved. 6
dse client-tool spark sql-schema -all > mapping.sql
dse client-tool configuration byos-export byos.conf
$DSE_HOME/clients/dse-byos_2.10-5.0.2.jar
byos.conf
© DataStax, All Rights Reserved. 7
#Exported node configuration properties
#Fri Jul 29 22:55:48 UTC 2016
spark.hadoop.cassandra.host=127.0.0.1
spark.hadoop.cassandra.auth.kerberos.enabled=false
spark.cassandra.auth.conf.factory=com.datastax.bdp.spark.DseByosAuthConfFactory
spark.cassandra.connection.port=9042
spark.hadoop.cassandra.ssl.enabled=false
spark.hadoop.cassandra.auth.kerberos.defaultScheme=false
spark.hadoop.cassandra.client.transport.factory=com.datastax.bdp.transport.client.TDseClientTransportFactory
spark.cassandra.connection.host=127.0.0.1
spark.hadoop.fs.cfs.impl=com.datastax.bdp.hadoop.cfs.CassandraFileSystem
spark.hadoop.cassandra.connection.native.port=9042
spark.hadoop.dse.client.configuration.impl=com.datastax.bdp.transport.client.HadoopBasedClientConfiguration
spark.cassandra.connection.factory=com.datastax.bdp.spark.DseCassandraConnectionFactory
spark.hadoop.cassandra.config.loader=com.datastax.bdp.config.DseConfigurationLoader
spark.hadoop.cassandra.connection.rpc.port=9160
spark.hadoop.dse.system_memory_in_mb=7985
spark.hadoop.cassandra.thrift.framedTransportSize=15728640
spark.hadoop.cassandra.partitioner=org.apache.cassandra.dht.Murmur3Partitioner
spark.hadoop.cassandra.dsefs.port=5598
mapping.sql
© DataStax, All Rights Reserved. 8
CREATE DATABASE IF NOT EXISTS test_keyspace;
USE test_keyspace;
CREATE TABLE test_table
USING org.apache.spark.sql.cassandra
OPTIONS (
keyspace "test_keyspace",
table "test_table",
pushdown "true");
Add BYOS to the Spark
• Copy dse-byos.jar, byos.conf and mapping.sql to a spark client node
• Merge byos.conf properties with spark defaults
• add DSE tables mapping (optional)
Run any spark application the same way:
© DataStax, All Rights Reserved. 9
cat byos.conf /etc/spark/conf/spark-defaults.conf > merged.conf
spark-sql --jars dse-byos*.jar --properties-file merged.conf –f mapping.sql
spark-shell --jars dse-byos*.jar --properties-file merged.conf
SSL Support
• Copy DSE client SSL certificate truststore and keystore files to Spark nodes
• Pass file locations to configuration generator
• Tip: You can use --files spark parameter to distribute files for the YARN job
© DataStax, All Rights Reserved. 10
dse client-tool configuration byos-export 
--set-truststore-path .truststore --set-truststore-password password 
--set-keystore-path .keystore --set-keystore-password password 
byos.conf
spark-shell --jars dse-byos*.jar --properties-file merged.conf 
--files .truststore,.keystore
Kerberos
• Kerberos setup on Spark cluster:
Just specify preferred JAAS connect in .java.login.config
DseClient {
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=true
renewTGT=true;
};
• No Kerberos on Spark Cluster? (less secure)
Request DSE token manually while generate config
© DataStax, All Rights Reserved. 11
Driver
Executors
KerberosAuth
DSEToken
DSE Token
dse client-tool configuration byos-export --generate-token
byos.conf
Usage: Migrate/Save/Load Data
© DataStax, All Rights Reserved. 12
• DSE tables to Hadoop and back
• Streaming
• DSE Max CFS and HDFS
• spark-shell
• dse spark
scala> sc.textFile("hdfs://hadoop1/data").saveAsTextFile("cfs:/data")
scala> val df = sqlContext.read.format("org.apache.spark.sql.cassandra")
.options(Map("keyspace"->"t", "table" -> "t")).load()
df.write.format("json").save ("/tmp/t.json”)
scala> sc.textFile("cfs:/data").saveAsTextFile("hdfs://hadoop1/data")
session_stream.saveToCassandra("web", "sessions")
Usage: JOIN/Enrich with C* Tables
• all C* tables are available after mapping
• join your RDD with C*
KILLER FEATURE: Enrich your stream, with C* on the fly
© DataStax, All Rights Reserved. 13
spark-sql> select * from hive_table h join cassandra_table с on h.key = c.key
scala> hrdd.joinWithCassandraTable("t", "t")
click_stream.joinWithCassandraTable("web", "sessions")
Building Full Lambda Architecture?
© DataStax, All Rights Reserved. 14
Add Speed Layer!
© DataStax, All Rights Reserved. 15
DSE
DSE
HBase?
© DataStax, All Rights Reserved. 16
Still HBase?
Double Master/Slave architecture
One for server, one for storage
Master-less architecture
OSS Spark Connector or DSE BYOS?
Feature OSS DSE BYOS
DataStax Official Support NO YES
Spark SQL Source Tables / Cassandra DataFrames YES YES
CassandraRDD batch and streaming YES YES
C* to Spark-SQL table mapping generator NO YES
Spark Configuration Generator NO YES
Cassandra File System Access NO YES
SSL Encryption YES YES
User/password authentication YES YES
Kerberos authentication NO YES
© DataStax, All Rights Reserved. 18
Kerberos Demo
Kerberos Demo
• No time for live demo. Find me at Meet Expert, for it
© DataStax, All Rights Reserved. 20
Kerberos Demo
• MIT Kerberos usage is well documented.
© DataStax, All Rights Reserved. 21
Kerberos Demo
• MIT Kerberos usage is well documented.
© DataStax, All Rights Reserved. 22
Kerberos Demo
• MIT Kerberos usage is well documented.
• MS Domain Controller will be used
© DataStax, All Rights Reserved. 23
Kerberos Demo
• MIT Kerberos usage is well documented.
• MS Domain Controller will be used
• Cloudera and MapR use MIT Kerberos
© DataStax, All Rights Reserved. 24
Kerberos Demo
• MIT Kerberos usage is well documented.
• MS Domain Controller will be used
• Cloudera and MapR use MIT Kerberos
© DataStax, All Rights Reserved. 25
Kerberos Demo
• MIT Kerberos usage is well documented.
• MS Domain Controller will be used
• Cloudera and MapR use MIT Kerberos
• Hortonworks supports Active Directory
© DataStax, All Rights Reserved. 26
Kerberos Demo
• MIT Kerberos usage is well documented.
• MS Domain Controller will be used
• Cloudera and MapR use MIT Kerberos
• Hortonworks supports Active Directory
• DataStax Enterprise full support:
• Kerberos Auth
• LDAP Auth
• LDAP Roles
27
Demo Servers
© DataStax, All Rights Reserved. 28
c1 c2
DSE 5.0.2
Domain Controller: Kerberos, Secure LDAP, DNS
Ubuntu LTS 14.04
h1 h2
Spark 1.6.1
Hadoop 2.7
Ubuntu LTS 14.04
Byos 5.0.2
• Realm: DC.DATASTAX.COM
• DNS Domain: dc.datastax.com
• Windows2012R2 server
• 2 Hadoop nodes
• 2 DataStax Enterprise 5.0 nodes
• Ubuntu 14.04
Domain Controller Setup
• DNS forward and reverse zones
• Secure LDAP
• Ambari setup wizard
• LDAP DseRoleManager (Optional)
• Organization Units
for Hadoop and DSE users/principals
© DataStax, All Rights Reserved. 29
Linux Join the Domain (Optional)
• REALMD and SSSD
#> apt-get install realmd sssd samba-common samba-common-bin samba-libs sssd-tools
krb5user adcli packagekit vim ntp -y
#> realm --verbose join -U Administrator DC.DATASTAX.COM
# optional create home directories for domain users
#> echo 'session required pam_mkhomedir.so skel=/etc/skel/ umask=0022' >>
/etc/pam.d/common-session
• Various workaround/additional steps for you Linux will be required
#> ln -s /usr/lib/x86_64-linux-gnu/ldb /usr/lib/x86_64-linux-gnu/samba
• Security will need to be tuned
© DataStax, All Rights Reserved. 30
#> apt-get install realmd sssd samba-common samba-common-bin samba-libs 
sssd-tools krb5-user adcli packagekit vim ntp -y
#> realm --verbose join -U Administrator DC.DATASTAX.COM
# optional create home directories for domain users
#> echo 'session required pam_mkhomedir.so skel=/etc/skel/ umask=0022' >> 
/etc/pam.d/common-session
#> ln -s /usr/lib/x86_64-linux-gnu/ldb /usr/lib/x86_64-linux-gnu/samba
Ambari Kerberos Wizard
© DataStax, All Rights Reserved. 31
• Admin->Kerberos ->
ActiveDirectory
• DC data :
• next next next
That will create a bunch of Windows
users and keytabs for them
• Configure Hadoop component
security and permissions
DataStax Enterprise
On windows:
• Create ‘dse’ user in a GUI.
• Create DSE keytabs for each node:
c:>ktpass -princ HTTP/c1.dc.datastax.com@DC.DATASTAX.COM -mapUser dse -pass
password -crypto all -out tmp.keytab
c:>ktpass -princ dse/c1.dc.datastax.com@DC.DATASTAX.COM -mapUser dse -pass
password -crypto all –in tmp.keytab -out c1.keytab
• copy keytabs to appropriate node
Enable Kerberos on DSE nodes:
https://docs.datastax.com/en/datastax_enterprise/5.0/datastax_enterprise/unifie
dAuth/configAuthenticate.html
© DataStax, All Rights Reserved. 32
c:>ktpass -princ HTTP/c1.dc.datastax.com@DC.DATASTAX.COM -mapUser dse -pass
****** -crypto all -out tmp.keytab
c:>ktpass -princ dse/c1.dc.datastax.com@DC.DATASTAX.COM -mapUser dse -pass
****** -crypto all –in tmp.keytab -out c1.keytab
DataStax Enterprise
• dse.yaml
authenticator: com.datastax.bdp.cassandra.auth.DseAuthenticator
authorizer: com.datastax.bdp.cassandra.auth.DseAuthorizer
authentication_options:
enabled: true
kerberos_options:
• Replace default cassandra user:
cqlsh> create role 'cassandra@DC.DATASTAX.COM' with SUPERUSER = true AND LOGIN =
true;
• User for Hadoop Spark Thrift Server
cqlsh> create role 'hive/hdp0.dc.datastax.com@DC.DATASTAX.COM' with LOGIN = true;
© DataStax, All Rights Reserved. 33
cqlsh> create role 'cassandra@DC.DATASTAX.COM' with SUPERUSER = true AND LOGIN =
true;
cqlsh> create role 'hive/hdp0.dc.datastax.com@DC.DATASTAX.COM' with LOGIN = true;
BYOS
• Generate the byos.conf usual way
dse client-tool configuration byos-export byos.conf
• create .java.login.config in Hadoop user home directory:
DseClient {
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=true
renewTGT=true;
};
• keytab usage could be configured in the file
© DataStax, All Rights Reserved. 34
dse client-tool configuration byos-export byos.conf
Spark
© DataStax, All Rights Reserved. 35
#>kinit
Password for cassandra@DC.DATASTAX.COM:
• Add CFS to spark.yarn.access.namenodes property, to request C* token.
#> spark-shell --master yarn-client --jars dse-byos*.jar --properties-file
merged.conf --conf spark.yarn.access.namenodes=cfs://node1/
Spark Thrift Server
Start:
Connect:
© DataStax, All Rights Reserved. 36
#> kinit -kt /etc/security/keytabs/hive.service.keytab 
hive/hdp0.dc.datastax.com@DC.DATASTAX.COM
#> cat /etc/spark/conf/spark-thrift-sparkconf.conf byos.conf > byos-
thrift.conf
#> start-thriftserver.sh --properties-file byos-thrift.conf --jars dse-
byos*.jar
#> kinit
#> beeline -u 
'jdbc:hive2://hdp0:10015/default;principal=hive/_HOST@DC.DATASTAX.COM'
Bring Your Own Spark!
© DataStax, All Rights Reserved. 37
HDFS
Hive
Meta
Store
ClusterManger(yarn)
Spark
SQL
Cassandra
Hive
Meta
Store
CFS
DSE Spark
SQL

More Related Content

What's hot

April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
Yahoo Developer Network
 

What's hot (20)

Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
 
Building the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for HadoopBuilding the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for Hadoop
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
MySQL Cluster Performance Tuning - 2013 MySQL User Conference
MySQL Cluster Performance Tuning - 2013 MySQL User ConferenceMySQL Cluster Performance Tuning - 2013 MySQL User Conference
MySQL Cluster Performance Tuning - 2013 MySQL User Conference
 
How Prometheus Store the Data
How Prometheus Store the DataHow Prometheus Store the Data
How Prometheus Store the Data
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
 
Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage ServiceQuick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
 
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage ServiceNetwork Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
 
Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)
 
Performance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationPerformance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migration
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Introducing SciaaS @ Sanger
Introducing SciaaS @ SangerIntroducing SciaaS @ Sanger
Introducing SciaaS @ Sanger
 
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
 
Apache cassandra v4.0
Apache cassandra v4.0Apache cassandra v4.0
Apache cassandra v4.0
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0
 
tow nodes Oracle 12c RAC on virtualbox
tow nodes Oracle 12c RAC on virtualboxtow nodes Oracle 12c RAC on virtualbox
tow nodes Oracle 12c RAC on virtualbox
 

Similar to DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev) | Cassandra Summit 2016

Securing Hadoop with OSSEC
Securing Hadoop with OSSECSecuring Hadoop with OSSEC
Securing Hadoop with OSSEC
Vic Hargrave
 

Similar to DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev) | Cassandra Summit 2016 (20)

Hi! Ho! Hi! Ho! SQL Server on Linux We Go!
Hi! Ho! Hi! Ho! SQL Server on Linux We Go!Hi! Ho! Hi! Ho! SQL Server on Linux We Go!
Hi! Ho! Hi! Ho! SQL Server on Linux We Go!
 
Hi! Ho! Hi! Ho! SQL Server on Linux We Go!
Hi! Ho! Hi! Ho! SQL Server on Linux We Go!Hi! Ho! Hi! Ho! SQL Server on Linux We Go!
Hi! Ho! Hi! Ho! SQL Server on Linux We Go!
 
Devoxx France 2015 - The Docker Orchestration Ecosystem on Azure
Devoxx France 2015 - The Docker Orchestration Ecosystem on AzureDevoxx France 2015 - The Docker Orchestration Ecosystem on Azure
Devoxx France 2015 - The Docker Orchestration Ecosystem on Azure
 
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
Docker San Francisco Meetup April 2015 - The Docker Orchestration Ecosystem o...
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Docker Seattle Meetup April 2015 - The Docker Orchestration Ecosystem on Azure
Docker Seattle Meetup April 2015 - The Docker Orchestration Ecosystem on AzureDocker Seattle Meetup April 2015 - The Docker Orchestration Ecosystem on Azure
Docker Seattle Meetup April 2015 - The Docker Orchestration Ecosystem on Azure
 
Docker New York Meetup May 2015 - The Docker Orchestration Ecosystem on Azure
Docker New York Meetup May 2015 - The Docker Orchestration Ecosystem on Azure Docker New York Meetup May 2015 - The Docker Orchestration Ecosystem on Azure
Docker New York Meetup May 2015 - The Docker Orchestration Ecosystem on Azure
 
Oracle RAC and Docker: The Why and How
Oracle RAC and Docker: The Why and HowOracle RAC and Docker: The Why and How
Oracle RAC and Docker: The Why and How
 
Kite SDK introduction for Portland Big Data
Kite SDK introduction for Portland Big DataKite SDK introduction for Portland Big Data
Kite SDK introduction for Portland Big Data
 
OpenCloudDay 2014: Deploying trusted developer sandboxes in Amazon's cloud
OpenCloudDay 2014: Deploying trusted developer sandboxes in Amazon's cloudOpenCloudDay 2014: Deploying trusted developer sandboxes in Amazon's cloud
OpenCloudDay 2014: Deploying trusted developer sandboxes in Amazon's cloud
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Azure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
Azure VM 101 - HomeGen by CloudGen Verona - Marco ObinuAzure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
Azure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Tech-Spark: SQL Server on Linux
Tech-Spark: SQL Server on LinuxTech-Spark: SQL Server on Linux
Tech-Spark: SQL Server on Linux
 
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
 
OpenStack Deployments with Chef
OpenStack Deployments with ChefOpenStack Deployments with Chef
OpenStack Deployments with Chef
 
Microservices with Terraform, Docker and the Cloud. Chicago Coders Conference...
Microservices with Terraform, Docker and the Cloud. Chicago Coders Conference...Microservices with Terraform, Docker and the Cloud. Chicago Coders Conference...
Microservices with Terraform, Docker and the Cloud. Chicago Coders Conference...
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
Securing Hadoop with OSSEC
Securing Hadoop with OSSECSecuring Hadoop with OSSEC
Securing Hadoop with OSSEC
 

More from DataStax

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 
Innovation Around Data and AI for Fraud Detection
Innovation Around Data and AI for Fraud DetectionInnovation Around Data and AI for Fraud Detection
Innovation Around Data and AI for Fraud Detection
 

Recently uploaded

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 

Recently uploaded (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 

DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev) | Cassandra Summit 2016

  • 1. Artem Aliev Bring Your Own Spark with Enterprise Security
  • 2. 1 DSE BYOS Overview 2 BYOS Configuration Tools 3 Use Cases 4 BYOS vs OSS Spark Connector 5 Kerberos Demo 2© DataStax, All Rights Reserved.
  • 3. Connect Your Spark to DSE © DataStax, All Rights Reserved. 3 HDFS Hive Meta Store ClusterManger Spark SQL DSE C* Hive Meta Store CFS DSE Spark SQL
  • 4. Connect Your Spark to DSE © DataStax, All Rights Reserved. 4 HDFS Hive Meta Store ClusterManger Spark SQL Hive Meta Store CFS DSE Spark SQL DSE C*
  • 5. Bring Your Own Spark! • A simple way to – Read Cassandra and CFS data from external Spark – Export necessary configuration info to connect to DSE • Includes security options – Export necessary Jars to connect – Attach these exported resource to a spark-submit • Also – Simple way to get the SparkSQL syntax to create catalog entries for tables in Cassandra – Read external HDFS data from DSE Spark jobs © DataStax, All Rights Reserved. 5
  • 6. BYOS Components • BYOS assembly jar (add it to spark jars) • spark-cassanda-connector, secure transport, CFS and dependencies $DSE_HOME/clients/dse-byos_2.10-5.0.2-SNAPSHOT.jar • Spark configuration generator (merge result with spark-defaults.conf) • Contains Cassandra host, auth type and factories dse client-tool configuration byos-export byos.conf • Spark-SQL Schema mapping generator (run result by spark-sql) • The sql script will create databases and table mapping for all C* tables © DataStax, All Rights Reserved. 6 dse client-tool spark sql-schema -all > mapping.sql dse client-tool configuration byos-export byos.conf $DSE_HOME/clients/dse-byos_2.10-5.0.2.jar
  • 7. byos.conf © DataStax, All Rights Reserved. 7 #Exported node configuration properties #Fri Jul 29 22:55:48 UTC 2016 spark.hadoop.cassandra.host=127.0.0.1 spark.hadoop.cassandra.auth.kerberos.enabled=false spark.cassandra.auth.conf.factory=com.datastax.bdp.spark.DseByosAuthConfFactory spark.cassandra.connection.port=9042 spark.hadoop.cassandra.ssl.enabled=false spark.hadoop.cassandra.auth.kerberos.defaultScheme=false spark.hadoop.cassandra.client.transport.factory=com.datastax.bdp.transport.client.TDseClientTransportFactory spark.cassandra.connection.host=127.0.0.1 spark.hadoop.fs.cfs.impl=com.datastax.bdp.hadoop.cfs.CassandraFileSystem spark.hadoop.cassandra.connection.native.port=9042 spark.hadoop.dse.client.configuration.impl=com.datastax.bdp.transport.client.HadoopBasedClientConfiguration spark.cassandra.connection.factory=com.datastax.bdp.spark.DseCassandraConnectionFactory spark.hadoop.cassandra.config.loader=com.datastax.bdp.config.DseConfigurationLoader spark.hadoop.cassandra.connection.rpc.port=9160 spark.hadoop.dse.system_memory_in_mb=7985 spark.hadoop.cassandra.thrift.framedTransportSize=15728640 spark.hadoop.cassandra.partitioner=org.apache.cassandra.dht.Murmur3Partitioner spark.hadoop.cassandra.dsefs.port=5598
  • 8. mapping.sql © DataStax, All Rights Reserved. 8 CREATE DATABASE IF NOT EXISTS test_keyspace; USE test_keyspace; CREATE TABLE test_table USING org.apache.spark.sql.cassandra OPTIONS ( keyspace "test_keyspace", table "test_table", pushdown "true");
  • 9. Add BYOS to the Spark • Copy dse-byos.jar, byos.conf and mapping.sql to a spark client node • Merge byos.conf properties with spark defaults • add DSE tables mapping (optional) Run any spark application the same way: © DataStax, All Rights Reserved. 9 cat byos.conf /etc/spark/conf/spark-defaults.conf > merged.conf spark-sql --jars dse-byos*.jar --properties-file merged.conf –f mapping.sql spark-shell --jars dse-byos*.jar --properties-file merged.conf
  • 10. SSL Support • Copy DSE client SSL certificate truststore and keystore files to Spark nodes • Pass file locations to configuration generator • Tip: You can use --files spark parameter to distribute files for the YARN job © DataStax, All Rights Reserved. 10 dse client-tool configuration byos-export --set-truststore-path .truststore --set-truststore-password password --set-keystore-path .keystore --set-keystore-password password byos.conf spark-shell --jars dse-byos*.jar --properties-file merged.conf --files .truststore,.keystore
  • 11. Kerberos • Kerberos setup on Spark cluster: Just specify preferred JAAS connect in .java.login.config DseClient { com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true renewTGT=true; }; • No Kerberos on Spark Cluster? (less secure) Request DSE token manually while generate config © DataStax, All Rights Reserved. 11 Driver Executors KerberosAuth DSEToken DSE Token dse client-tool configuration byos-export --generate-token byos.conf
  • 12. Usage: Migrate/Save/Load Data © DataStax, All Rights Reserved. 12 • DSE tables to Hadoop and back • Streaming • DSE Max CFS and HDFS • spark-shell • dse spark scala> sc.textFile("hdfs://hadoop1/data").saveAsTextFile("cfs:/data") scala> val df = sqlContext.read.format("org.apache.spark.sql.cassandra") .options(Map("keyspace"->"t", "table" -> "t")).load() df.write.format("json").save ("/tmp/t.json”) scala> sc.textFile("cfs:/data").saveAsTextFile("hdfs://hadoop1/data") session_stream.saveToCassandra("web", "sessions")
  • 13. Usage: JOIN/Enrich with C* Tables • all C* tables are available after mapping • join your RDD with C* KILLER FEATURE: Enrich your stream, with C* on the fly © DataStax, All Rights Reserved. 13 spark-sql> select * from hive_table h join cassandra_table с on h.key = c.key scala> hrdd.joinWithCassandraTable("t", "t") click_stream.joinWithCassandraTable("web", "sessions")
  • 14. Building Full Lambda Architecture? © DataStax, All Rights Reserved. 14
  • 15. Add Speed Layer! © DataStax, All Rights Reserved. 15 DSE DSE
  • 16. HBase? © DataStax, All Rights Reserved. 16
  • 17. Still HBase? Double Master/Slave architecture One for server, one for storage Master-less architecture
  • 18. OSS Spark Connector or DSE BYOS? Feature OSS DSE BYOS DataStax Official Support NO YES Spark SQL Source Tables / Cassandra DataFrames YES YES CassandraRDD batch and streaming YES YES C* to Spark-SQL table mapping generator NO YES Spark Configuration Generator NO YES Cassandra File System Access NO YES SSL Encryption YES YES User/password authentication YES YES Kerberos authentication NO YES © DataStax, All Rights Reserved. 18
  • 20. Kerberos Demo • No time for live demo. Find me at Meet Expert, for it © DataStax, All Rights Reserved. 20
  • 21. Kerberos Demo • MIT Kerberos usage is well documented. © DataStax, All Rights Reserved. 21
  • 22. Kerberos Demo • MIT Kerberos usage is well documented. © DataStax, All Rights Reserved. 22
  • 23. Kerberos Demo • MIT Kerberos usage is well documented. • MS Domain Controller will be used © DataStax, All Rights Reserved. 23
  • 24. Kerberos Demo • MIT Kerberos usage is well documented. • MS Domain Controller will be used • Cloudera and MapR use MIT Kerberos © DataStax, All Rights Reserved. 24
  • 25. Kerberos Demo • MIT Kerberos usage is well documented. • MS Domain Controller will be used • Cloudera and MapR use MIT Kerberos © DataStax, All Rights Reserved. 25
  • 26. Kerberos Demo • MIT Kerberos usage is well documented. • MS Domain Controller will be used • Cloudera and MapR use MIT Kerberos • Hortonworks supports Active Directory © DataStax, All Rights Reserved. 26
  • 27. Kerberos Demo • MIT Kerberos usage is well documented. • MS Domain Controller will be used • Cloudera and MapR use MIT Kerberos • Hortonworks supports Active Directory • DataStax Enterprise full support: • Kerberos Auth • LDAP Auth • LDAP Roles 27
  • 28. Demo Servers © DataStax, All Rights Reserved. 28 c1 c2 DSE 5.0.2 Domain Controller: Kerberos, Secure LDAP, DNS Ubuntu LTS 14.04 h1 h2 Spark 1.6.1 Hadoop 2.7 Ubuntu LTS 14.04 Byos 5.0.2 • Realm: DC.DATASTAX.COM • DNS Domain: dc.datastax.com • Windows2012R2 server • 2 Hadoop nodes • 2 DataStax Enterprise 5.0 nodes • Ubuntu 14.04
  • 29. Domain Controller Setup • DNS forward and reverse zones • Secure LDAP • Ambari setup wizard • LDAP DseRoleManager (Optional) • Organization Units for Hadoop and DSE users/principals © DataStax, All Rights Reserved. 29
  • 30. Linux Join the Domain (Optional) • REALMD and SSSD #> apt-get install realmd sssd samba-common samba-common-bin samba-libs sssd-tools krb5user adcli packagekit vim ntp -y #> realm --verbose join -U Administrator DC.DATASTAX.COM # optional create home directories for domain users #> echo 'session required pam_mkhomedir.so skel=/etc/skel/ umask=0022' >> /etc/pam.d/common-session • Various workaround/additional steps for you Linux will be required #> ln -s /usr/lib/x86_64-linux-gnu/ldb /usr/lib/x86_64-linux-gnu/samba • Security will need to be tuned © DataStax, All Rights Reserved. 30 #> apt-get install realmd sssd samba-common samba-common-bin samba-libs sssd-tools krb5-user adcli packagekit vim ntp -y #> realm --verbose join -U Administrator DC.DATASTAX.COM # optional create home directories for domain users #> echo 'session required pam_mkhomedir.so skel=/etc/skel/ umask=0022' >> /etc/pam.d/common-session #> ln -s /usr/lib/x86_64-linux-gnu/ldb /usr/lib/x86_64-linux-gnu/samba
  • 31. Ambari Kerberos Wizard © DataStax, All Rights Reserved. 31 • Admin->Kerberos -> ActiveDirectory • DC data : • next next next That will create a bunch of Windows users and keytabs for them • Configure Hadoop component security and permissions
  • 32. DataStax Enterprise On windows: • Create ‘dse’ user in a GUI. • Create DSE keytabs for each node: c:>ktpass -princ HTTP/c1.dc.datastax.com@DC.DATASTAX.COM -mapUser dse -pass password -crypto all -out tmp.keytab c:>ktpass -princ dse/c1.dc.datastax.com@DC.DATASTAX.COM -mapUser dse -pass password -crypto all –in tmp.keytab -out c1.keytab • copy keytabs to appropriate node Enable Kerberos on DSE nodes: https://docs.datastax.com/en/datastax_enterprise/5.0/datastax_enterprise/unifie dAuth/configAuthenticate.html © DataStax, All Rights Reserved. 32 c:>ktpass -princ HTTP/c1.dc.datastax.com@DC.DATASTAX.COM -mapUser dse -pass ****** -crypto all -out tmp.keytab c:>ktpass -princ dse/c1.dc.datastax.com@DC.DATASTAX.COM -mapUser dse -pass ****** -crypto all –in tmp.keytab -out c1.keytab
  • 33. DataStax Enterprise • dse.yaml authenticator: com.datastax.bdp.cassandra.auth.DseAuthenticator authorizer: com.datastax.bdp.cassandra.auth.DseAuthorizer authentication_options: enabled: true kerberos_options: • Replace default cassandra user: cqlsh> create role 'cassandra@DC.DATASTAX.COM' with SUPERUSER = true AND LOGIN = true; • User for Hadoop Spark Thrift Server cqlsh> create role 'hive/hdp0.dc.datastax.com@DC.DATASTAX.COM' with LOGIN = true; © DataStax, All Rights Reserved. 33 cqlsh> create role 'cassandra@DC.DATASTAX.COM' with SUPERUSER = true AND LOGIN = true; cqlsh> create role 'hive/hdp0.dc.datastax.com@DC.DATASTAX.COM' with LOGIN = true;
  • 34. BYOS • Generate the byos.conf usual way dse client-tool configuration byos-export byos.conf • create .java.login.config in Hadoop user home directory: DseClient { com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true renewTGT=true; }; • keytab usage could be configured in the file © DataStax, All Rights Reserved. 34 dse client-tool configuration byos-export byos.conf
  • 35. Spark © DataStax, All Rights Reserved. 35 #>kinit Password for cassandra@DC.DATASTAX.COM: • Add CFS to spark.yarn.access.namenodes property, to request C* token. #> spark-shell --master yarn-client --jars dse-byos*.jar --properties-file merged.conf --conf spark.yarn.access.namenodes=cfs://node1/
  • 36. Spark Thrift Server Start: Connect: © DataStax, All Rights Reserved. 36 #> kinit -kt /etc/security/keytabs/hive.service.keytab hive/hdp0.dc.datastax.com@DC.DATASTAX.COM #> cat /etc/spark/conf/spark-thrift-sparkconf.conf byos.conf > byos- thrift.conf #> start-thriftserver.sh --properties-file byos-thrift.conf --jars dse- byos*.jar #> kinit #> beeline -u 'jdbc:hive2://hdp0:10015/default;principal=hive/_HOST@DC.DATASTAX.COM'
  • 37. Bring Your Own Spark! © DataStax, All Rights Reserved. 37 HDFS Hive Meta Store ClusterManger(yarn) Spark SQL Cassandra Hive Meta Store CFS DSE Spark SQL

Editor's Notes

  1. It is not a Way of the Samurai
  2. It is not a Way of the Samurai
  3. It is not a Way of the Samurai
  4. That’s the way!