SlideShare a Scribd company logo
Powering Social Business Intelligence
Cassandra and Hadoop at the Dachis Group
Social Business Wha?
Big Data meets Big Budgets




• Brand marketers spend
  • $450 (~£270) billion annually on tradition media
  • $50 (~£30) billion annually on SEO/SEM
• Starting to transition to social media
Effectiveness - Traditional
Measure all the things!




            Measuring traditional marketing effectiveness
Effectiveness - Social
Measure all the things!




             Measuring social marketing effectiveness
The Dachis Group
Measure all the things!




• Jeff Dachis amasses small army of social strategists
• Funds team to create social analytics platform
  • Measure business outcomes of social media strategies
  • Track social media surrounding Forbes Global 2000
  • Include all brands, all subsidiaries, all social media types
Architecture

• Raw data in S3
• Cassandra
  • Realtime queries to return raw data
  • Hadoop analytic integration for foundational measures
  • Horizontal scalability
  • Operationally simple
• RDBMS
  • Time rollups of measures
  • Aggregates and composite measures
  • Arbitrary dimensional queries
  • Mini data warehouse
Pipeline



                                                            Memcached
   AWS S3                       Cassandra                    Postgres




   Raw Signal                      Signal                       Metrics
    Storage                      Repository                      Store




                Normalization           Enrichment   Analysis
Normalization




• Parallel copy from S3 to HDFS
• MapReduce to Cassandra from Raw to Normalized CF
• Normalized data model
  • Decent investment to get right
  • Mostly for conceptual reasons rather than concerns about queries
  • Secondary indexes vs app maintained indexes
Enrichment



• Enrich with
  • Unique company/brand information
  • Sentiment
  • Relationships
  • Conversations
  • Social graph information
• Enter Pig
• Enter Oozie
The Bleeding Edge
Pig


 • newlogicalplan in 0.8.0
 • Debugging/tracing?
 • Incremental development
 • Working with Cassandra
      • Pygmalion - facilitating to and from Cassandra
 • Experience, unit test framework, UDFs, community



                                slowly became
The Bleeding Edge
Oozie
• Learning curve and common errors
  • User impersonation
  • Logs, we haz them, lots of them
  • Web UI needs love
• Specific to Cassandra
  • mapreduce.fileoutputcommitter.marksuccessfuljobs
  • See http://wiki.apache.org/cassandra/HadoopSupport#Oozie
• Still very good DAG workflow crunching tool
  • Subworkflows, fork/join, regular scheduling, dataset detection
  • Extensible
  • Apache Incubator (@oozie on twitter, #oozie on freenode)
The Bleeding Edge
Cassandra

 • Rack aware snitch and replication
   • Always rotate racks in order in topology
   • In EC2 this likely means rotate AZs
 • Dealing with scanning over column families
   • Project early
 • General tuning and unique workload
   • Mahout and other higher memory hadoop tasks
   • EC2 instance types
 • Visualization tool helped (OpsCenter, Acunu has Control Center)
 • Community++
Social Business Index
Launches September 2011


                          • Global Ranking of Companies
                          • Industry Rankings
                          • Visualization of strategy
This might actually work!


 • Fall 2011, built up the team
 • Expertise in Pig, Lucene/Solr, machine learning, statistics, event
  prediction and analysis
 • Making everlasting gobstoppers
Social Performance Monitor
The measures behind the score
Topics topics topics




• Black Friday
  • Science project!
  • Mallet, Pig
  • Custom analysis
• Superbowl
• Oscars
Productizing Topics



• Ongoing automated topic detection
• Lessons from one-off topic analysis
• Represented by term distributions
• Threads with detail like
  • Signal volume
  • Participants
  • Links
  • Sentiment gauge
Advocates




• Auto-discovery of potential advocates
• Curated set of known advocates
• Example signal (from Cassandra)
• Reports and other useful bits
Lessons learned




• Emerging products are sometimes frustrating, but well worth the pain in
 their respective niche.
• “Never underestimate the massive impact of small bugs in big
 data.” (@peteskomoroch at LinkedIn)
• Community karma
A Note on Community

• Community involvement
  • IRC, mailing lists, twitter, conferences, meetups
  • Newer projects have little or outdated docs
  • Some features may be
    • Deprecated
    • Not ready for primetime
    • Not a fit for your use case
• Community karma
  • Don’t just take
  • Be a bridge builder
  • Positive karma helps
Questions?




• We’re hiring
• Ping me @jeromatron (Twitter and IRC)

More Related Content

What's hot

Cloud Optimized Big Data
Cloud Optimized Big DataCloud Optimized Big Data
Cloud Optimized Big Data
Joydeep Sen Sarma
 
Hadoop and HBase @eBay
Hadoop and HBase @eBayHadoop and HBase @eBay
Hadoop and HBase @eBay
DataWorks Summit
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
MapR Technologies
 
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
Data Con LA
 
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceJoydeep Sen Sarma
 
Summer Shorts: Big Data Integration
Summer Shorts: Big Data IntegrationSummer Shorts: Big Data Integration
Summer Shorts: Big Data Integration
ibi
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
Dmitry Tolpeko
 
Big data advance topics - part 2.pptx
Big data   advance topics - part 2.pptxBig data   advance topics - part 2.pptx
Big data advance topics - part 2.pptx
Moldovan Radu Adrian
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
Jakub Stransky
 
Data Engineering Quick Guide
Data Engineering Quick GuideData Engineering Quick Guide
Data Engineering Quick Guide
Asim Jalis
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015
Tugdual Grall
 
Big Data A La Carte Menu
Big Data A La Carte MenuBig Data A La Carte Menu
Big Data A La Carte Menu
Venkatesh Balakumar
 
Not Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache HadoopNot Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache Hadoop
Adaryl "Bob" Wakefield, MBA
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and Future
Ryan Hennig
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
Gert Drapers
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
Patrick Nicolas
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
Spark Summit
 
Concepts on Hadoop
Concepts on HadoopConcepts on Hadoop
Concepts on Hadoop
Christopher Sharkey
 

What's hot (20)

Cloud Optimized Big Data
Cloud Optimized Big DataCloud Optimized Big Data
Cloud Optimized Big Data
 
Hadoop and HBase @eBay
Hadoop and HBase @eBayHadoop and HBase @eBay
Hadoop and HBase @eBay
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
 
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
 
Summer Shorts: Big Data Integration
Summer Shorts: Big Data IntegrationSummer Shorts: Big Data Integration
Summer Shorts: Big Data Integration
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 
Big data advance topics - part 2.pptx
Big data   advance topics - part 2.pptxBig data   advance topics - part 2.pptx
Big data advance topics - part 2.pptx
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
Data Engineering Quick Guide
Data Engineering Quick GuideData Engineering Quick Guide
Data Engineering Quick Guide
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015
 
Big Data A La Carte Menu
Big Data A La Carte MenuBig Data A La Carte Menu
Big Data A La Carte Menu
 
Hadoop-2 @ eBay
Hadoop-2 @ eBayHadoop-2 @ eBay
Hadoop-2 @ eBay
 
Not Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache HadoopNot Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache Hadoop
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and Future
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
 
Concepts on Hadoop
Concepts on HadoopConcepts on Hadoop
Concepts on Hadoop
 

Similar to Cassandra eu

Hydra - Content Processing Framework for Search Driven Solutions
Hydra - Content Processing Framework for Search Driven SolutionsHydra - Content Processing Framework for Search Driven Solutions
Hydra - Content Processing Framework for Search Driven Solutions
Findwise
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Hydra Project Management Survey
Hydra Project Management SurveyHydra Project Management Survey
Hydra Project Management Survey
Mark Notess
 
Modul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxModul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptx
NouhaElhaji1
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 
Apache drill
Apache drillApache drill
Apache drill
MapR Technologies
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
Bhupesh Bansal
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
Institute of Contemporary Sciences
 
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Cloudera, Inc.
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)
Alexander Alten
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
Dremio Corporation
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
Dunn Solutions Group
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenChristopher Whitaker
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
BigDataEverywhere
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014
Miguel Pastor
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
Andrei Savu
 

Similar to Cassandra eu (20)

Hydra - Content Processing Framework for Search Driven Solutions
Hydra - Content Processing Framework for Search Driven SolutionsHydra - Content Processing Framework for Search Driven Solutions
Hydra - Content Processing Framework for Search Driven Solutions
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Hydra Project Management Survey
Hydra Project Management SurveyHydra Project Management Survey
Hydra Project Management Survey
 
Modul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxModul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptx
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
Apache drill
Apache drillApache drill
Apache drill
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
Ds01 data science
Ds01   data scienceDs01   data science
Ds01 data science
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 

More from Jeremy Hanna

Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache CassandraGöteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Jeremy Hanna
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
Jeremy Hanna
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
Jeremy Hanna
 
Modern Cassandra for Developers
Modern Cassandra for DevelopersModern Cassandra for Developers
Modern Cassandra for DevelopersJeremy Hanna
 
Troubleshooting Cassandra
Troubleshooting CassandraTroubleshooting Cassandra
Troubleshooting Cassandra
Jeremy Hanna
 
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
Cassandra + Hadoop: Analisi Batch con Apache CassandraCassandra + Hadoop: Analisi Batch con Apache Cassandra
Cassandra + Hadoop: Analisi Batch con Apache CassandraJeremy Hanna
 
End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraJeremy Hanna
 
Pig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in AnalyticsPig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in Analytics
Jeremy Hanna
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop Integration
Jeremy Hanna
 
Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon Jeremy Hanna
 
Cassandra+Hadoop
Cassandra+HadoopCassandra+Hadoop
Cassandra+Hadoop
Jeremy Hanna
 

More from Jeremy Hanna (11)

Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache CassandraGöteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Modern Cassandra for Developers
Modern Cassandra for DevelopersModern Cassandra for Developers
Modern Cassandra for Developers
 
Troubleshooting Cassandra
Troubleshooting CassandraTroubleshooting Cassandra
Troubleshooting Cassandra
 
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
Cassandra + Hadoop: Analisi Batch con Apache CassandraCassandra + Hadoop: Analisi Batch con Apache Cassandra
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
 
End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache Cassandra
 
Pig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in AnalyticsPig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in Analytics
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop Integration
 
Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon
 
Cassandra+Hadoop
Cassandra+HadoopCassandra+Hadoop
Cassandra+Hadoop
 

Recently uploaded

Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 

Recently uploaded (20)

Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 

Cassandra eu

  • 1. Powering Social Business Intelligence Cassandra and Hadoop at the Dachis Group
  • 2. Social Business Wha? Big Data meets Big Budgets • Brand marketers spend • $450 (~£270) billion annually on tradition media • $50 (~£30) billion annually on SEO/SEM • Starting to transition to social media
  • 3. Effectiveness - Traditional Measure all the things! Measuring traditional marketing effectiveness
  • 4. Effectiveness - Social Measure all the things! Measuring social marketing effectiveness
  • 5. The Dachis Group Measure all the things! • Jeff Dachis amasses small army of social strategists • Funds team to create social analytics platform • Measure business outcomes of social media strategies • Track social media surrounding Forbes Global 2000 • Include all brands, all subsidiaries, all social media types
  • 6. Architecture • Raw data in S3 • Cassandra • Realtime queries to return raw data • Hadoop analytic integration for foundational measures • Horizontal scalability • Operationally simple • RDBMS • Time rollups of measures • Aggregates and composite measures • Arbitrary dimensional queries • Mini data warehouse
  • 7. Pipeline Memcached AWS S3 Cassandra Postgres Raw Signal Signal Metrics Storage Repository Store Normalization Enrichment Analysis
  • 8. Normalization • Parallel copy from S3 to HDFS • MapReduce to Cassandra from Raw to Normalized CF • Normalized data model • Decent investment to get right • Mostly for conceptual reasons rather than concerns about queries • Secondary indexes vs app maintained indexes
  • 9. Enrichment • Enrich with • Unique company/brand information • Sentiment • Relationships • Conversations • Social graph information • Enter Pig • Enter Oozie
  • 10. The Bleeding Edge Pig • newlogicalplan in 0.8.0 • Debugging/tracing? • Incremental development • Working with Cassandra • Pygmalion - facilitating to and from Cassandra • Experience, unit test framework, UDFs, community slowly became
  • 11. The Bleeding Edge Oozie • Learning curve and common errors • User impersonation • Logs, we haz them, lots of them • Web UI needs love • Specific to Cassandra • mapreduce.fileoutputcommitter.marksuccessfuljobs • See http://wiki.apache.org/cassandra/HadoopSupport#Oozie • Still very good DAG workflow crunching tool • Subworkflows, fork/join, regular scheduling, dataset detection • Extensible • Apache Incubator (@oozie on twitter, #oozie on freenode)
  • 12. The Bleeding Edge Cassandra • Rack aware snitch and replication • Always rotate racks in order in topology • In EC2 this likely means rotate AZs • Dealing with scanning over column families • Project early • General tuning and unique workload • Mahout and other higher memory hadoop tasks • EC2 instance types • Visualization tool helped (OpsCenter, Acunu has Control Center) • Community++
  • 13. Social Business Index Launches September 2011 • Global Ranking of Companies • Industry Rankings • Visualization of strategy
  • 14. This might actually work! • Fall 2011, built up the team • Expertise in Pig, Lucene/Solr, machine learning, statistics, event prediction and analysis • Making everlasting gobstoppers
  • 15. Social Performance Monitor The measures behind the score
  • 16. Topics topics topics • Black Friday • Science project! • Mallet, Pig • Custom analysis • Superbowl • Oscars
  • 17. Productizing Topics • Ongoing automated topic detection • Lessons from one-off topic analysis • Represented by term distributions • Threads with detail like • Signal volume • Participants • Links • Sentiment gauge
  • 18. Advocates • Auto-discovery of potential advocates • Curated set of known advocates • Example signal (from Cassandra) • Reports and other useful bits
  • 19. Lessons learned • Emerging products are sometimes frustrating, but well worth the pain in their respective niche. • “Never underestimate the massive impact of small bugs in big data.” (@peteskomoroch at LinkedIn) • Community karma
  • 20. A Note on Community • Community involvement • IRC, mailing lists, twitter, conferences, meetups • Newer projects have little or outdated docs • Some features may be • Deprecated • Not ready for primetime • Not a fit for your use case • Community karma • Don’t just take • Be a bridge builder • Positive karma helps
  • 21. Questions? • We’re hiring • Ping me @jeromatron (Twitter and IRC)

Editor's Notes

  1. \n
  2. \n
  3. This is how they see their capability after, for example, the Superbowl.\n
  4. When managers ask how effective the campaign was, the marketing department says it was awesome. When asked how they know that, they say that Zoltar told them so. In reality there are a lot of home grown methods, some good, some not so good. Some of what we did grew out of a spreadsheet that was manually updated, validated and refined over time with one of our major customers.\n\n
  5. What brands does Berkshire Hathaway have under its gigantic umbrella?!?\nCan mention Red Bull, Disney, HP, Levis, Samsung, Honda, etc.\n
  6. Operationally simple doesn’t mean that you don’t need to learn a lot about it, just that there aren’t a lot of moving parts.\nUnique use case in that it’s hybrid. Both lots of writes and analytics and reads.\n
  7. \n
  8. It’s just scads of text, but we do classify - conversations long/short difference between microblogs and blogs.\nWe may use hadoop to generate alternate CFs for specific queries as we need them.\n
  9. Company information is unique because we had to buy, borrow, steal and yes crowd source that data.\nPig handles joins really well for example account snapshots and signal for enrichment.\n\n
  10. Mention Brandon’s work to make things better with CassandraStorage and newer versions of Pig, including regression tests.\nSpeculative exectution.\n
  11. Mention having looked at Azkaban as well.\nNo real way around the logs, just takes getting used to. User impersonation is a product of the authorization framework, patch added to DSE.\n
  12. Mention consistency level choices.\nRotate racks - yeah, wasn’t documented except in the code.\nBackup/restore.\nRoot causes sometimes difficult to determine.\nScaling up - each order of magnitude jump has its own problems.\n
  13. But the long sleepless Summer finally pays off...\n
  14. Everlasting gobstoppers are a fun phase for the projects.\n
  15. Reveals numbers\n
  16. Explanation\nGreat working as a team\nMention Boxing Day\n
  17. Also customer curated topics in the future\n
  18. \n
  19. Data consistency - periodic checks, staging cluster, unit tests, integration testing.\nReparable data. Sometimes incredibly painful, but possible.\nMention backup/restore.\nMention root causes.\n
  20. Be active in communities of these new projects\nIf necessary start building communities around them\nDon’t just take, answer questions, follow mailing lists but have a filter, docs, bug submission, feature requests, votes, representation, tests, patches/pull requests.\n
  21. \n