SlideShare a Scribd company logo
1 of 27
When Rotten Tomatoes
Isn’t Enough
1 © DataStax, All Rights Reserved. Confidential
Analyzing Twitter Movie Reviews
using DataStax Enterprise
Amanda K Moran
Technical Evangelist for DataStax
What Are We Talking About Today
• What Problem Are We Trying To Solve?
• Introduction to Apache Cassandra
– What is it?
– Why do I need it?
• Introduction To Apache Spark
– What is it?
– Why do I need it?
• What is Sentiment Analysis?
• How does DataStax Enterprise Analytics Help Us?
• Overview of Demo
• Actual Demo!
2 © DataStax, All Rights Reserved. Confidential
But first… A Little About Amanda
• California Native born in Redlands
• Graduated with MS in Computer Science and
Engineering from Santa Clara University in 2012
• Worked as a Software Engineer for 6 years at
Lockheed Martin, HP, Teradata, Esgyn and now
DataStax as a Technical Evangelist
• Apache Committer, PMC Member, and initial
contributor to all installation and deployment work for
Apache Trafodion
• Keywords: Disney, Cloud, Dogs, Veggies, Linux,
Databases, Big Data, Analytics, Testing, and
Running
© DataStax, All Rights Reserved.3 Confidential
What Problem Are We
Really Trying to Solve?
4 © DataStax, All Rights Reserved. Confidential
What Movie Should I See?
• Wouldn’t it be great if I could ask 1 million people this question?
• Wouldn’t it be great if I could automate this process?
• Data Analytics doesn’t have to be complicated!
• We are going to utilize the power of Big Data using
– Apache Cassandra
– Apache Spark
– Spark Machine Learning library
– Jupyter notebooks
– Python
– Twitter Tweets and API
– Pattern for Sentiment Analysis
© DataStax, All Rights Reserved. Confidential5
(Brief) Introduction to
Apache Cassandra
6 © DataStax, All Rights Reserved. Confidential
What is Apache Cassandra?
• First developed by Facebook
• Became a top-level Apache
Foundation project in 2010
• Distributed, decentralized database
• Elastic scalability -- add/remove nodes with no downtime
• High performance -- very fast
• High availability / fault tolerant -- no single point of failure
• Solves many of the problems faced with a traditional DB for certain
workloads
© DataStax, All Rights Reserved. Confidential7
What is DataStax Enterprise?
• DataStax has been some of the key contributors to the Cassandra
project
– DataStax Enterprise is a commercial product that provides
• More cool features
• More QA
• More support
© DataStax, All Rights Reserved. Confidential8
What Does All This Mean?
• Let’s talk about 4 Big Topics:
– Distributed
– Replication
– Elastically Scalable
– High Availability
© DataStax, All Rights Reserved. Confidential9
Note: Don’t forget this is just a brief intro!
Distributed
• Every node in the cluster has the same role
– Really!
– Cassandra does not have a Master-Worker Architecture
• Any client can connect to any node
– All nodes are Read and Write ready
• But this is not to say that all nodes
contain all data
© DataStax, All Rights Reserved. Confidential10
Replication
• To be able to survive a node going down data must be
copied to other nodes
• The Replication Factor (RF) is set by the user
– 1-Number of nodes in the Cluster (not recommended)
• The data is asynchronously replicated
– Automatic
– Peer-to-peer communication
© DataStax, All Rights Reserved. Confidential11
Elastically Scalable
• As more nodes are added, performance increases linearly
• You can scale up or down with no downtime
– Not even a restart!
• Reads and Writes both scale
© DataStax, All Rights Reserved. Confidential12
High Availability
• The lack of a Master node allows for high availability
– No single point of failure
• Replication allows nodes to fail and data to still be
available
– Cassandra expects nodes to fail and doesn’t panic
• Multiple Data Center support is out of the box
© DataStax, All Rights Reserved. Confidential13
One Small Trade Off
• The CAP Theorem
– Availability
– Consistency
– Partitioning
• You can’t have it all --Impossible
– Cassandra chooses to have
eventual consistency as the default
• But you can prioritize consistency over availability
– Consistency levels are configurable!
© DataStax, All Rights Reserved. Confidential14
Why Do I Need Cassandra?
• Think about your application:
– Do you have Big Data (a lot of it!)?
– Do you need to be able to read/write fast?
– Do you need to be able to scale up/down easily?
– Do you need High Availability?
– Do you need multiple data center support?
• Multi-cloud/hybrid cloud support?
If you application needs any of these things, you want to
consider Cassandra!
© DataStax, All Rights Reserved. Confidential15
(Small) Introduction to
Apache Spark
16 © DataStax, All Rights Reserved. Confidential
What is Apache Spark?
• “Apache Spark is a unified analytics engine for large-scale
data processing” --https://spark.apache.org
• 100 times faster than Hadoop for analytics
– Utilizes in-memory processing
– Amazing parallelism
• Machine Learning library -- Spark MLlib
© DataStax, All Rights Reserved. Confidential17
Why Do I Need Spark?
• Think about your application:
– Do you have Big Data?
– Do you need High Availability?
– Do you need analytics at lightning speed?
– Do you need a simple way to get insights into your data?
If you application needs any of these things, you want to
consider Spark!
•
© DataStax, All Rights Reserved. Confidential18
What is Sentiment
Analysis?
19 © DataStax, All Rights Reserved. Confidential
What is Sentiment Analysis?
• Sentiment Analysis at a high level is very simple
• Natural Language processing and text analytics to
determine if a word or sentence
– Positive
– Negative
– Neutral
• This is easy to understand, but difficult for machines to
learn how to do!
© DataStax, All Rights Reserved. Confidential20
How does DataStax
Enterprise Analytics
Help Us?
21 © DataStax, All Rights Reserved. Confidential
Cassandra vs DSE Analytics
22 © DataStax, All Rights Reserved. Confidential
Cassandra DSE Analytics
Spark integration No -- connectors available Yes
High Availability HDFS, Spark Resource Manager No Single Point of Failure
Support Jira, emails, google! Dedicated support
Deployment Manual installation DSE Ops Center
One unified platform to do all this complex work!
Demo Overview
23 © DataStax, All Rights Reserved. Confidential
Analyzing Twitter with DSE Analytics and Jupyter
• Local DSE Analytics Setup
• Local Jupyter Setup
• Pull twitter data on a movie title
• Clean up the tweets
• Insert into Cassandra
• Create Spark Dataframe
• Use Spark ML
• Sentiment Analysis with Pattern
– Gives positive/negative
• Take average of these scores
• Should I see this movie??
© DataStax, All Rights Reserved. Confidential24
DemoCassandra, Spark, Python, Jupyter
Notebooks, Cassandra Python driver,
Tweets, Twitter API, and Pattern
© DataStax, All Rights Reserved. Confidential25
Information and Links
• Get this demo: https://github.com/amandamoran/bigdatadayla
– More analytics to come!
• Learn more about Cassandra: https://academy.datastax.com/
• Learn more about Spark: https://spark.apache.org/
• Learn more about DataStax: https://www.datastax.com/
• Follow me on Twitter: @AmandaDataStax
© DataStax, All Rights Reserved. Confidential26
Thank you
27 © DataStax, All Rights Reserved. Confidential

More Related Content

What's hot

How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?DataStax
 
Webinar: DataStax Managed Cloud: focus on innovation, not administration
Webinar:  DataStax Managed Cloud: focus on innovation, not administrationWebinar:  DataStax Managed Cloud: focus on innovation, not administration
Webinar: DataStax Managed Cloud: focus on innovation, not administrationDataStax
 
Solr consistency and recovery internals
Solr consistency and recovery internalsSolr consistency and recovery internals
Solr consistency and recovery internalsCloudera, Inc.
 
Webinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each DayWebinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each DayDataStax
 
Building a Digital Bank
Building a Digital BankBuilding a Digital Bank
Building a Digital BankDataStax
 
Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
Webinar: Comparing DataStax Enterprise with Open Source Apache CassandraWebinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
Webinar: Comparing DataStax Enterprise with Open Source Apache CassandraDataStax
 
Webinar: Become PSD2 ready with DataStax
Webinar: Become PSD2 ready with DataStaxWebinar: Become PSD2 ready with DataStax
Webinar: Become PSD2 ready with DataStaxDataStax
 
Webinar - Bringing connected graph data to Cassandra with DSE Graph
Webinar - Bringing connected graph data to Cassandra with DSE GraphWebinar - Bringing connected graph data to Cassandra with DSE Graph
Webinar - Bringing connected graph data to Cassandra with DSE GraphDataStax
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDataStax
 
Data Drive Applications_Webinar
Data Drive Applications_WebinarData Drive Applications_Webinar
Data Drive Applications_WebinarSean Spediacci
 
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...DataStax
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDataStax
 
Apache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance UpdateApache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance UpdateCloudera, Inc.
 
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...DataStax
 
Engaging with Cloudera & Morning Wrap Up
Engaging with Cloudera & Morning Wrap UpEngaging with Cloudera & Morning Wrap Up
Engaging with Cloudera & Morning Wrap UpCloudera, Inc.
 
How to Successfully Visualize DSE Graph data
How to Successfully Visualize DSE Graph dataHow to Successfully Visualize DSE Graph data
How to Successfully Visualize DSE Graph dataDataStax
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...Cloudera, Inc.
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera, Inc.
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming ArchitecturesCloudera, Inc.
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
 

What's hot (20)

How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?
 
Webinar: DataStax Managed Cloud: focus on innovation, not administration
Webinar:  DataStax Managed Cloud: focus on innovation, not administrationWebinar:  DataStax Managed Cloud: focus on innovation, not administration
Webinar: DataStax Managed Cloud: focus on innovation, not administration
 
Solr consistency and recovery internals
Solr consistency and recovery internalsSolr consistency and recovery internals
Solr consistency and recovery internals
 
Webinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each DayWebinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each Day
 
Building a Digital Bank
Building a Digital BankBuilding a Digital Bank
Building a Digital Bank
 
Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
Webinar: Comparing DataStax Enterprise with Open Source Apache CassandraWebinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
 
Webinar: Become PSD2 ready with DataStax
Webinar: Become PSD2 ready with DataStaxWebinar: Become PSD2 ready with DataStax
Webinar: Become PSD2 ready with DataStax
 
Webinar - Bringing connected graph data to Cassandra with DSE Graph
Webinar - Bringing connected graph data to Cassandra with DSE GraphWebinar - Bringing connected graph data to Cassandra with DSE Graph
Webinar - Bringing connected graph data to Cassandra with DSE Graph
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
 
Data Drive Applications_Webinar
Data Drive Applications_WebinarData Drive Applications_Webinar
Data Drive Applications_Webinar
 
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
Apache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance UpdateApache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance Update
 
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
 
Engaging with Cloudera & Morning Wrap Up
Engaging with Cloudera & Morning Wrap UpEngaging with Cloudera & Morning Wrap Up
Engaging with Cloudera & Morning Wrap Up
 
How to Successfully Visualize DSE Graph data
How to Successfully Visualize DSE Graph dataHow to Successfully Visualize DSE Graph data
How to Successfully Visualize DSE Graph data
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming Architectures
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 

Similar to When Rotten Tomatoes Isn't Enough: Analyzing Twitter Movie Reviews using DSE

Reporting from the Trenches: Intuit & Cassandra
Reporting from the Trenches: Intuit & CassandraReporting from the Trenches: Intuit & Cassandra
Reporting from the Trenches: Intuit & CassandraDataStax
 
Apache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranApache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranData Con LA
 
Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7DataStax
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015Cloudera, Inc.
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyInside Analysis
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSWJason Hubbard
 
PyData: The Next Generation
PyData: The Next GenerationPyData: The Next Generation
PyData: The Next GenerationWes McKinney
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantagePrecisely
 
Meet the experts dwo bde vds v7
Meet the experts dwo bde vds v7Meet the experts dwo bde vds v7
Meet the experts dwo bde vds v7mmathipra
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSteven Totman
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Cloudera, Inc.
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio..."Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...Dataconomy Media
 
Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
Webinar: Comparing DataStax Enterprise with Open Source Apache CassandraWebinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
Webinar: Comparing DataStax Enterprise with Open Source Apache CassandraDataStax
 
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStaxWebinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStaxDataStax
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 

Similar to When Rotten Tomatoes Isn't Enough: Analyzing Twitter Movie Reviews using DSE (20)

Reporting from the Trenches: Intuit & Cassandra
Reporting from the Trenches: Intuit & CassandraReporting from the Trenches: Intuit & Cassandra
Reporting from the Trenches: Intuit & Cassandra
 
Apache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranApache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda Moran
 
Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
Talend for big_data_intorduction
Talend for big_data_intorductionTalend for big_data_intorduction
Talend for big_data_intorduction
 
PyData: The Next Generation
PyData: The Next GenerationPyData: The Next Generation
PyData: The Next Generation
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
Meet the experts dwo bde vds v7
Meet the experts dwo bde vds v7Meet the experts dwo bde vds v7
Meet the experts dwo bde vds v7
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio..."Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
Webinar: Comparing DataStax Enterprise with Open Source Apache CassandraWebinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
 
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStaxWebinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

When Rotten Tomatoes Isn't Enough: Analyzing Twitter Movie Reviews using DSE

  • 1. When Rotten Tomatoes Isn’t Enough 1 © DataStax, All Rights Reserved. Confidential Analyzing Twitter Movie Reviews using DataStax Enterprise Amanda K Moran Technical Evangelist for DataStax
  • 2. What Are We Talking About Today • What Problem Are We Trying To Solve? • Introduction to Apache Cassandra – What is it? – Why do I need it? • Introduction To Apache Spark – What is it? – Why do I need it? • What is Sentiment Analysis? • How does DataStax Enterprise Analytics Help Us? • Overview of Demo • Actual Demo! 2 © DataStax, All Rights Reserved. Confidential
  • 3. But first… A Little About Amanda • California Native born in Redlands • Graduated with MS in Computer Science and Engineering from Santa Clara University in 2012 • Worked as a Software Engineer for 6 years at Lockheed Martin, HP, Teradata, Esgyn and now DataStax as a Technical Evangelist • Apache Committer, PMC Member, and initial contributor to all installation and deployment work for Apache Trafodion • Keywords: Disney, Cloud, Dogs, Veggies, Linux, Databases, Big Data, Analytics, Testing, and Running © DataStax, All Rights Reserved.3 Confidential
  • 4. What Problem Are We Really Trying to Solve? 4 © DataStax, All Rights Reserved. Confidential
  • 5. What Movie Should I See? • Wouldn’t it be great if I could ask 1 million people this question? • Wouldn’t it be great if I could automate this process? • Data Analytics doesn’t have to be complicated! • We are going to utilize the power of Big Data using – Apache Cassandra – Apache Spark – Spark Machine Learning library – Jupyter notebooks – Python – Twitter Tweets and API – Pattern for Sentiment Analysis © DataStax, All Rights Reserved. Confidential5
  • 6. (Brief) Introduction to Apache Cassandra 6 © DataStax, All Rights Reserved. Confidential
  • 7. What is Apache Cassandra? • First developed by Facebook • Became a top-level Apache Foundation project in 2010 • Distributed, decentralized database • Elastic scalability -- add/remove nodes with no downtime • High performance -- very fast • High availability / fault tolerant -- no single point of failure • Solves many of the problems faced with a traditional DB for certain workloads © DataStax, All Rights Reserved. Confidential7
  • 8. What is DataStax Enterprise? • DataStax has been some of the key contributors to the Cassandra project – DataStax Enterprise is a commercial product that provides • More cool features • More QA • More support © DataStax, All Rights Reserved. Confidential8
  • 9. What Does All This Mean? • Let’s talk about 4 Big Topics: – Distributed – Replication – Elastically Scalable – High Availability © DataStax, All Rights Reserved. Confidential9 Note: Don’t forget this is just a brief intro!
  • 10. Distributed • Every node in the cluster has the same role – Really! – Cassandra does not have a Master-Worker Architecture • Any client can connect to any node – All nodes are Read and Write ready • But this is not to say that all nodes contain all data © DataStax, All Rights Reserved. Confidential10
  • 11. Replication • To be able to survive a node going down data must be copied to other nodes • The Replication Factor (RF) is set by the user – 1-Number of nodes in the Cluster (not recommended) • The data is asynchronously replicated – Automatic – Peer-to-peer communication © DataStax, All Rights Reserved. Confidential11
  • 12. Elastically Scalable • As more nodes are added, performance increases linearly • You can scale up or down with no downtime – Not even a restart! • Reads and Writes both scale © DataStax, All Rights Reserved. Confidential12
  • 13. High Availability • The lack of a Master node allows for high availability – No single point of failure • Replication allows nodes to fail and data to still be available – Cassandra expects nodes to fail and doesn’t panic • Multiple Data Center support is out of the box © DataStax, All Rights Reserved. Confidential13
  • 14. One Small Trade Off • The CAP Theorem – Availability – Consistency – Partitioning • You can’t have it all --Impossible – Cassandra chooses to have eventual consistency as the default • But you can prioritize consistency over availability – Consistency levels are configurable! © DataStax, All Rights Reserved. Confidential14
  • 15. Why Do I Need Cassandra? • Think about your application: – Do you have Big Data (a lot of it!)? – Do you need to be able to read/write fast? – Do you need to be able to scale up/down easily? – Do you need High Availability? – Do you need multiple data center support? • Multi-cloud/hybrid cloud support? If you application needs any of these things, you want to consider Cassandra! © DataStax, All Rights Reserved. Confidential15
  • 16. (Small) Introduction to Apache Spark 16 © DataStax, All Rights Reserved. Confidential
  • 17. What is Apache Spark? • “Apache Spark is a unified analytics engine for large-scale data processing” --https://spark.apache.org • 100 times faster than Hadoop for analytics – Utilizes in-memory processing – Amazing parallelism • Machine Learning library -- Spark MLlib © DataStax, All Rights Reserved. Confidential17
  • 18. Why Do I Need Spark? • Think about your application: – Do you have Big Data? – Do you need High Availability? – Do you need analytics at lightning speed? – Do you need a simple way to get insights into your data? If you application needs any of these things, you want to consider Spark! • © DataStax, All Rights Reserved. Confidential18
  • 19. What is Sentiment Analysis? 19 © DataStax, All Rights Reserved. Confidential
  • 20. What is Sentiment Analysis? • Sentiment Analysis at a high level is very simple • Natural Language processing and text analytics to determine if a word or sentence – Positive – Negative – Neutral • This is easy to understand, but difficult for machines to learn how to do! © DataStax, All Rights Reserved. Confidential20
  • 21. How does DataStax Enterprise Analytics Help Us? 21 © DataStax, All Rights Reserved. Confidential
  • 22. Cassandra vs DSE Analytics 22 © DataStax, All Rights Reserved. Confidential Cassandra DSE Analytics Spark integration No -- connectors available Yes High Availability HDFS, Spark Resource Manager No Single Point of Failure Support Jira, emails, google! Dedicated support Deployment Manual installation DSE Ops Center One unified platform to do all this complex work!
  • 23. Demo Overview 23 © DataStax, All Rights Reserved. Confidential
  • 24. Analyzing Twitter with DSE Analytics and Jupyter • Local DSE Analytics Setup • Local Jupyter Setup • Pull twitter data on a movie title • Clean up the tweets • Insert into Cassandra • Create Spark Dataframe • Use Spark ML • Sentiment Analysis with Pattern – Gives positive/negative • Take average of these scores • Should I see this movie?? © DataStax, All Rights Reserved. Confidential24
  • 25. DemoCassandra, Spark, Python, Jupyter Notebooks, Cassandra Python driver, Tweets, Twitter API, and Pattern © DataStax, All Rights Reserved. Confidential25
  • 26. Information and Links • Get this demo: https://github.com/amandamoran/bigdatadayla – More analytics to come! • Learn more about Cassandra: https://academy.datastax.com/ • Learn more about Spark: https://spark.apache.org/ • Learn more about DataStax: https://www.datastax.com/ • Follow me on Twitter: @AmandaDataStax © DataStax, All Rights Reserved. Confidential26
  • 27. Thank you 27 © DataStax, All Rights Reserved. Confidential