SlideShare a Scribd company logo
Confidential and Proprietary to Daugherty Business Solutions
So You Don't Have an Admin Team -
Doing Big Data using Amazon's analogs
Adam Doyle
Stampedecon 2017
Confidential & Proprietary to Daugherty Business Solutions.
EIM and Analytics
Data Science
• Predictive and Prescriptive Analytics
• Social, Text and Sentiment Analytics
• Natural Language Processing
• Machine Learning, Artificial Intelligence
• SPSS, SAS, R, IBM Watson™
Strategy and Competency Building
• Build the right, comprehensive solution blueprint across
12 Domains
• Establish, specific, actionable plan and ROIs
• Protecting your investments
• Organization, Talent, Competency
• Processes, Methods, Techniques, Tools
• Speed – Agile EIM Transformation
• Governance processes
Customer and Business Analytics
• Customer/Buyer/Channel Segmentation
• Persona Development, Customer Scoring (Value, Potential)
• Attrition Modeling, Engagement and Response Modeling
• Inventory Management, Marketing Campaigns
• Product Design Analytics, Workforce Planning, Location
Based Advertising
• Data Monetization
Traditional Data Warehouse and Business
Intelligence
• EDW, ODS, Data Mart and Integration
• Master Data Management
• Data Governance
• Dashboards, Scorecards,
• Reports , Alerts
• Multidimensional Analysis
• Ad hoc slicing and dicing
• Self Service Enablement
• Cloud Migration and Agile EIM
ANALYTICS
STRATEGY
EIM and
ANALYTICS
400+ employees
strong
Digital Engagement/Analytics
• Customer Engagement Strategies
• Omni-channel and Integrated Marketing
• Strategic Planning, Building and Executing
Digital and Customer Engagement Solutions.
Big Data and Next
Generation Technologies
• Data Lab Development Centers
• Data Lakes, Analytic Platforms
• Hadoop (Cloudera, Hortonworks)
• NoSQL / Graph DB (MongoDB, DataStax
• Cloud platforms (AWS, Google, Azure)
• Spark, Sqoop, Hive, Pig, Kafka, etc.
Confidential and Proprietary to Daugherty Business Solutions
• 20 year veteran of the St. Louis
IT community
• Co-Organizer, St. Louis Hadoop
User Group
• Big Data Community Lead,
Daugherty Business Solutions
• Formerly Big Data Solution
Architect at Amitech, Lead Big
Data developer at Mercy
• Speaker at local and national
Big Data conferences
Meet Adam Doyle
3
Confidential and Proprietary to Daugherty Business Solutions
You are developing an Internet of Things solution for a kitchen appliance
manufacturer. Essentially you are trying to answer the eternal question:
4
Problem Statement
http://www.routercheck.com/2014/01/27/is-your-refrigerator-running/
Confidential and Proprietary to Daugherty Business Solutions
• Great! You’ve got options
– Hadoop on EC2 with a distribution
– Hadoop on EMR with a distribution
– Hadoop on EMR with Amazon’s Hadoop version
Let’s say you wanted to do Big Data on AWS
Confidential and Proprietary to Daugherty Business Solutions
Hadoop
6
So what would that look like?
API
Client
Flume
Client
Kafka
Client
Kafka
Spark
Streaming
HBase Hive HDFS
Mahout
Spark
MLIB
Spark
SQL
SOLR
API
Client
Flume
Client
Kafka
Client
NiFi
Confidential and Proprietary to Daugherty Business Solutions
• Virtual machines in the cloud
• Choice of many different options
– Operating system
– Processors
– Memory
– Disk sizes
• Can be created in minutes
• Can be created through code
• Can be turned off when not needed to reduce costs
7
EC2
Confidential and Proprietary to Daugherty Business Solutions
• All of these options require that you have a Hadoop administrator that
can tweak the installation for performance.
• Your servers generally need to be up and running, so you are paying for
them even when they are not heavily utilized.
8
The downsides
Confidential and Proprietary to Daugherty Business Solutions
• You can use Amazon’s services to roll your own Big Data application
9
Or …
http://www.writingfordesigners.com/?p=19906
Confidential and Proprietary to Daugherty Business Solutions 10
Ingest
AWS
API
Gateway
Flume
Client
AWS IoT
Lambda
Lambda
Lambda
AWS
Greengrass
Confidential and Proprietary to Daugherty Business Solutions
• Three step process to set up an API
– Define the API
– Create the client
– Create the server
• Wizard to help define the API
• Connects to Lambda, DynamoDB, EC2, S3
11
API Gateway
API
Gateway
API
Client
Confidential and Proprietary to Daugherty Business Solutions
• Serverless code execution
• No servers to provision or manage
• Event trigger based
• You pay only for code execution time
• Automatic scaling up to user defined thresholds
• Currently only a few languages supported (Node.js, Java, Python, and
C#)
12
Lambda
Lambda
Confidential and Proprietary to Daugherty Business Solutions
• Device Gateway
• Message Broker
• Rules Engine
• Security and Identity Service
• Thing Registry
• Thing Shadow
• Thing Shadows Service
• Integrations with other AWS components
• Processing SDK
• Device SDK
13
AWS IoT
AWS IoT
Confidential and Proprietary to Daugherty Business Solutions
• Extends the functions of AWS IoT to intermittently
connected devices
• Devices connect to a local Greengrass core
• Core connects to server when connection is present
14
AWS Greengrass
NiFi
AWS
Greengrass
Confidential and Proprietary to Daugherty Business Solutions
AWS
Lambda
Lambda
Lambda
15
Processing data in real-time
Lambda
Kinesis
Lambda
SQS
API
Gateway
AWS IoT
Flume
Client
Confidential and Proprietary to Daugherty Business Solutions
• Publish/subscribe messaging service - topics
• Dynamically resize consumer/publisher bandwidth
• Cleans up after itself after 24 hours
16
Kinesis
Kinesis
Kafka
Confidential and Proprietary to Daugherty Business Solutions
• Queue based service
• Destructive reads
17
Simple Queue Service (SQS)
Standard Queue FIFO Queue
High throughput Limited throughput (300 TPS)
At-Least-Once Delivery Exactly-Once Processing
Best-Effort Ordering First-In-First-Out Delivery
SQS
Spark
Streaming
Confidential and Proprietary to Daugherty Business Solutions
• Scheduled batch operations
• WYSIWYG editor
18
Data Pipeline
Confidential and Proprietary to Daugherty Business Solutions 19
Storing Data
AWS
API
Client
Flume
Client
Kafka
Client
Lambda
Lambda
Lambda
Lambda
Kinesis
Lambda
SQS
API
Gateway
Flume
Client
AWS IoT
S3 RDS
Dynamo
DB
Confidential and Proprietary to Daugherty Business Solutions
• File storage in the cloud
• Store file backups offsite
• Host static websites
• Highly available – 99.99%
• Highly durable – 99.999999999%
• Versioning can be turned on
20
S3
S3
HDFS
Confidential and Proprietary to Daugherty Business Solutions
• Create a cloud-based RDBMS
– Amazon Aurora
– MySQL
– MariaDB
– PostgreSQL
– Oracle
– SQL Server
• Costs based on type of engine, size of database, storage
21
RDS
RDS
Hive
Confidential and Proprietary to Daugherty Business Solutions
• NoSQL Document Store
• Handles sparse data
• Pay for Read/Write Capacity and Storage
22
DynamoDB
Dynamo
DB
HBase
Confidential and Proprietary to Daugherty Business Solutions 23
Analyzing Data
AWS
API
Client
Flume
Client
Kafka
Client
Lambda
Lambda
Lambda
Lambda
Kinesis
Lambda
SQS
API
Gateway
Flume
Client
AWS IoT
S3 RDS
Dynamo
DB
Athena Redshift
Machine
Learning
Confidential and Proprietary to Daugherty Business Solutions
Athena has a limited set of
formats that it works with:
• Apache Web Logs
• CSV
• TSV
• Text File with Custom
Delimiters
• JSON
• Parquet
• ORC
Advantages
• Serverless
• Scalable
24
Athena
Athena
Hive
Confidential and Proprietary to Daugherty Business Solutions
• PostgreSQL compatible syntax with columnar storage
• Designed for DWH/OLAP queries
• Integrates with DynamoDB, S3, and Data Pipeline
• Tunable concurrency limits
25
Redshift
Redshift
Spark
SQL
Confidential and Proprietary to Daugherty Business Solutions
• Offers three types of machine learning models:
– Binary Classification
– Multiclass Classification
– Regression
• Offers batch or synchronous modes
26
Machine Learning
Machine
Learning
Mahout
Spark
MLIB
Confidential and Proprietary to Daugherty Business Solutions
AWS
API
Client
Flume
Client
Kafka
Client
Lambda
Lambda
Lambda
Lambda
Kinesis
Lambda
SQS
API
Gateway
Flume
Client
AWS IoT
S3 RDS
Dynamo
DB
Athena Redshift
Machine
Learning
27
Search
Elastic
Search
Confidential and Proprietary to Daugherty Business Solutions
• Amazon’s implementation of Elastic’s ElasticSearch product
• Distributed JSON-based search analytics engine
• Designed for Horizontal scalability, reliability, and easy management
• Combined with Logstash and Kibana to form the ELK stack
28
Elastic Search
Elastic
Search
SOLR
Confidential and Proprietary to Daugherty Business Solutions
• Scalability
• Fault-tolerance
• Security
• Cost
29
Other concerns
Confidential and Proprietary to Daugherty Business Solutions
• Adding more resources to AWS clusters can be done at the click of a
button.
• Most AWS services allow for additional resources to be added. Some
allow for autoscaling.
• Autoscaling can be used to limit the cost of cluster operation.
30
Scalability
Confidential and Proprietary to Daugherty Business Solutions
• AWS services are designed to be self-healing.
• The underlying data store for most applications is S3.
31
Fault-Tolerance
Confidential and Proprietary to Daugherty Business Solutions
• Security for all of your cluster resources is managed by IAM (Identity
and Access Management).
• Policies can be set for each resource with fine-grained access control.
• Arguably, this is one area where having a skilled administrator can be a
great help.
32
Security
Confidential and Proprietary to Daugherty Business Solutions
• You can perform cost calculations before using any services
• You only pay for what you use (no contracts!)
• But, you will get a better price if you used Reserved Instances (annual
or multi-year contracts)
• You can easily tie infrastructure costs to a product or department
• There is a free tier that can be used for a year
• You just need a credit card to get started
33
Cost
Confidential and Proprietary to Daugherty Business Solutions 34
And more
Confidential and Proprietary to Daugherty Business Solutions
Join Our Team
Contact:
Adam.doyle@daugherty.com

More Related Content

What's hot

WKS402 Well-Architected Workshop
WKS402 Well-Architected WorkshopWKS402 Well-Architected Workshop
WKS402 Well-Architected Workshop
Amazon Web Services
 
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...
Amazon Web Services
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
Cloudera, Inc.
 
AWS glue technical enablement training
AWS glue technical enablement trainingAWS glue technical enablement training
AWS glue technical enablement training
Info Alchemy Corporation
 
Transformation Track AWS Cloud Experience Argentina - Bases de Datos en AWS
Transformation Track AWS Cloud Experience Argentina - Bases de Datos en AWSTransformation Track AWS Cloud Experience Argentina - Bases de Datos en AWS
Transformation Track AWS Cloud Experience Argentina - Bases de Datos en AWS
Amazon Web Services LATAM
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Amazon Web Services
 
Building Killr Applications with DataStax Enterprise
Building Killr Applications with  DataStax EnterpriseBuilding Killr Applications with  DataStax Enterprise
Building Killr Applications with DataStax Enterprise
DataStax
 
Azure Hd insigth news
Azure Hd insigth newsAzure Hd insigth news
Azure Hd insigth news
nnakasone
 
The New Normal - AWSome Day Zurich 112016
The New Normal - AWSome Day Zurich 112016The New Normal - AWSome Day Zurich 112016
The New Normal - AWSome Day Zurich 112016
Amazon Web Services
 
Modern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesModern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetes
Slim Baltagi
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Lucas Jellema
 
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
Amazon Web Services
 
AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)
AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)
AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)
Amazon Web Services
 
情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data
Treasure Data, Inc.
 
AWS re:Invent 2016: Workshop: Addressing Your Business Needs with AWS (ARC210)
AWS re:Invent 2016: Workshop: Addressing Your Business Needs with AWS (ARC210)AWS re:Invent 2016: Workshop: Addressing Your Business Needs with AWS (ARC210)
AWS re:Invent 2016: Workshop: Addressing Your Business Needs with AWS (ARC210)
Amazon Web Services
 
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Cloudera, Inc.
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john mallory
Amazon Web Services
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
Amazon Web Services LATAM
 
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Amazon Web Services
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
DataStax
 

What's hot (20)

WKS402 Well-Architected Workshop
WKS402 Well-Architected WorkshopWKS402 Well-Architected Workshop
WKS402 Well-Architected Workshop
 
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
 
AWS glue technical enablement training
AWS glue technical enablement trainingAWS glue technical enablement training
AWS glue technical enablement training
 
Transformation Track AWS Cloud Experience Argentina - Bases de Datos en AWS
Transformation Track AWS Cloud Experience Argentina - Bases de Datos en AWSTransformation Track AWS Cloud Experience Argentina - Bases de Datos en AWS
Transformation Track AWS Cloud Experience Argentina - Bases de Datos en AWS
 
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate TorontoDatabase and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
 
Building Killr Applications with DataStax Enterprise
Building Killr Applications with  DataStax EnterpriseBuilding Killr Applications with  DataStax Enterprise
Building Killr Applications with DataStax Enterprise
 
Azure Hd insigth news
Azure Hd insigth newsAzure Hd insigth news
Azure Hd insigth news
 
The New Normal - AWSome Day Zurich 112016
The New Normal - AWSome Day Zurich 112016The New Normal - AWSome Day Zurich 112016
The New Normal - AWSome Day Zurich 112016
 
Modern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesModern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetes
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
 
AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)
AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)
AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)
 
情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data
 
AWS re:Invent 2016: Workshop: Addressing Your Business Needs with AWS (ARC210)
AWS re:Invent 2016: Workshop: Addressing Your Business Needs with AWS (ARC210)AWS re:Invent 2016: Workshop: Addressing Your Business Needs with AWS (ARC210)
AWS re:Invent 2016: Workshop: Addressing Your Business Needs with AWS (ARC210)
 
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john mallory
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
 
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 

Similar to Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017

AWS FSI Symposium 2017 NYC - Moving at the Speed of Serverless ft Broadridge
AWS FSI Symposium 2017 NYC - Moving at the Speed of Serverless ft BroadridgeAWS FSI Symposium 2017 NYC - Moving at the Speed of Serverless ft Broadridge
AWS FSI Symposium 2017 NYC - Moving at the Speed of Serverless ft Broadridge
Amazon Web Services
 
Journey to the Cloud: What I Wish I Knew Before I Started
Journey to the Cloud: What I Wish I Knew Before I Started Journey to the Cloud: What I Wish I Knew Before I Started
Journey to the Cloud: What I Wish I Knew Before I Started
Datavail
 
Journey to the Cloud: What I Wish I Knew Before I Started
 Journey to the Cloud: What I Wish I Knew Before I Started Journey to the Cloud: What I Wish I Knew Before I Started
Journey to the Cloud: What I Wish I Knew Before I Started
Datavail
 
Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations
CloudHesive
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
thando80
 
Enterprise Serverless Adoption. An Experience Report
Enterprise Serverless Adoption. An Experience ReportEnterprise Serverless Adoption. An Experience Report
Enterprise Serverless Adoption. An Experience Report
SheenBrisals
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
Gustav Lundström
 
Top Considerations When Deciding Between Cloud Apps, Cloud Infrastructure or ...
Top Considerations When Deciding Between Cloud Apps, Cloud Infrastructure or ...Top Considerations When Deciding Between Cloud Apps, Cloud Infrastructure or ...
Top Considerations When Deciding Between Cloud Apps, Cloud Infrastructure or ...
Datavail
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
Cloudera, Inc.
 
AWS Innovate Ottawa Keynote - Jeff Kratz
 AWS Innovate Ottawa Keynote - Jeff Kratz AWS Innovate Ottawa Keynote - Jeff Kratz
AWS Innovate Ottawa Keynote - Jeff Kratz
Amazon Web Services
 
2014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 3652014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 365
Marco Parenzan
 
Running Enterprise Workloads on AWS
Running Enterprise Workloads on AWSRunning Enterprise Workloads on AWS
Running Enterprise Workloads on AWS
Amazon Web Services
 
Big Data IDEA 101 2019
Big Data IDEA 101 2019Big Data IDEA 101 2019
Big Data IDEA 101 2019
Adam Doyle
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Qubole
 
Cloud computing options
Cloud computing optionsCloud computing options
Cloud computing options
botsplash.com
 
Major Cloud Platforms Players - Year 2015
Major Cloud Platforms Players - Year 2015Major Cloud Platforms Players - Year 2015
Major Cloud Platforms Players - Year 2015
Krishna-Kumar
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB
 
Spark: Building an application from Start to Finish
Spark: Building an application from Start to FinishSpark: Building an application from Start to Finish
Spark: Building an application from Start to Finish
Adam Doyle
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
Jim Kaskade
 
It resource us signal cloud presentation itr - final
It resource   us signal cloud presentation itr - finalIt resource   us signal cloud presentation itr - final
It resource us signal cloud presentation itr - final
svanelderen
 

Similar to Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017 (20)

AWS FSI Symposium 2017 NYC - Moving at the Speed of Serverless ft Broadridge
AWS FSI Symposium 2017 NYC - Moving at the Speed of Serverless ft BroadridgeAWS FSI Symposium 2017 NYC - Moving at the Speed of Serverless ft Broadridge
AWS FSI Symposium 2017 NYC - Moving at the Speed of Serverless ft Broadridge
 
Journey to the Cloud: What I Wish I Knew Before I Started
Journey to the Cloud: What I Wish I Knew Before I Started Journey to the Cloud: What I Wish I Knew Before I Started
Journey to the Cloud: What I Wish I Knew Before I Started
 
Journey to the Cloud: What I Wish I Knew Before I Started
 Journey to the Cloud: What I Wish I Knew Before I Started Journey to the Cloud: What I Wish I Knew Before I Started
Journey to the Cloud: What I Wish I Knew Before I Started
 
Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
 
Enterprise Serverless Adoption. An Experience Report
Enterprise Serverless Adoption. An Experience ReportEnterprise Serverless Adoption. An Experience Report
Enterprise Serverless Adoption. An Experience Report
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
 
Top Considerations When Deciding Between Cloud Apps, Cloud Infrastructure or ...
Top Considerations When Deciding Between Cloud Apps, Cloud Infrastructure or ...Top Considerations When Deciding Between Cloud Apps, Cloud Infrastructure or ...
Top Considerations When Deciding Between Cloud Apps, Cloud Infrastructure or ...
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
AWS Innovate Ottawa Keynote - Jeff Kratz
 AWS Innovate Ottawa Keynote - Jeff Kratz AWS Innovate Ottawa Keynote - Jeff Kratz
AWS Innovate Ottawa Keynote - Jeff Kratz
 
2014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 3652014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 365
 
Running Enterprise Workloads on AWS
Running Enterprise Workloads on AWSRunning Enterprise Workloads on AWS
Running Enterprise Workloads on AWS
 
Big Data IDEA 101 2019
Big Data IDEA 101 2019Big Data IDEA 101 2019
Big Data IDEA 101 2019
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big Data
 
Cloud computing options
Cloud computing optionsCloud computing options
Cloud computing options
 
Major Cloud Platforms Players - Year 2015
Major Cloud Platforms Players - Year 2015Major Cloud Platforms Players - Year 2015
Major Cloud Platforms Players - Year 2015
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
Spark: Building an application from Start to Finish
Spark: Building an application from Start to FinishSpark: Building an application from Start to Finish
Spark: Building an application from Start to Finish
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
It resource us signal cloud presentation itr - final
It resource   us signal cloud presentation itr - finalIt resource   us signal cloud presentation itr - final
It resource us signal cloud presentation itr - final
 

More from StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
StampedeCon
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
StampedeCon
 

More from StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 

Recently uploaded

How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
digitalxplive
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Networks
 
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
Edge AI and Vision Alliance
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
bhumivarma35300
 
Salesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot WorkshopSalesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot Workshop
CEPTES Software Inc
 
Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...
chetankumar9855
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
moinahousna
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Zilliz
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
sunilverma7884
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 

Recently uploaded (20)

How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
 
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
 
Salesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot WorkshopSalesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot Workshop
 
Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 

Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017

  • 1. Confidential and Proprietary to Daugherty Business Solutions So You Don't Have an Admin Team - Doing Big Data using Amazon's analogs Adam Doyle Stampedecon 2017
  • 2. Confidential & Proprietary to Daugherty Business Solutions. EIM and Analytics Data Science • Predictive and Prescriptive Analytics • Social, Text and Sentiment Analytics • Natural Language Processing • Machine Learning, Artificial Intelligence • SPSS, SAS, R, IBM Watson™ Strategy and Competency Building • Build the right, comprehensive solution blueprint across 12 Domains • Establish, specific, actionable plan and ROIs • Protecting your investments • Organization, Talent, Competency • Processes, Methods, Techniques, Tools • Speed – Agile EIM Transformation • Governance processes Customer and Business Analytics • Customer/Buyer/Channel Segmentation • Persona Development, Customer Scoring (Value, Potential) • Attrition Modeling, Engagement and Response Modeling • Inventory Management, Marketing Campaigns • Product Design Analytics, Workforce Planning, Location Based Advertising • Data Monetization Traditional Data Warehouse and Business Intelligence • EDW, ODS, Data Mart and Integration • Master Data Management • Data Governance • Dashboards, Scorecards, • Reports , Alerts • Multidimensional Analysis • Ad hoc slicing and dicing • Self Service Enablement • Cloud Migration and Agile EIM ANALYTICS STRATEGY EIM and ANALYTICS 400+ employees strong Digital Engagement/Analytics • Customer Engagement Strategies • Omni-channel and Integrated Marketing • Strategic Planning, Building and Executing Digital and Customer Engagement Solutions. Big Data and Next Generation Technologies • Data Lab Development Centers • Data Lakes, Analytic Platforms • Hadoop (Cloudera, Hortonworks) • NoSQL / Graph DB (MongoDB, DataStax • Cloud platforms (AWS, Google, Azure) • Spark, Sqoop, Hive, Pig, Kafka, etc.
  • 3. Confidential and Proprietary to Daugherty Business Solutions • 20 year veteran of the St. Louis IT community • Co-Organizer, St. Louis Hadoop User Group • Big Data Community Lead, Daugherty Business Solutions • Formerly Big Data Solution Architect at Amitech, Lead Big Data developer at Mercy • Speaker at local and national Big Data conferences Meet Adam Doyle 3
  • 4. Confidential and Proprietary to Daugherty Business Solutions You are developing an Internet of Things solution for a kitchen appliance manufacturer. Essentially you are trying to answer the eternal question: 4 Problem Statement http://www.routercheck.com/2014/01/27/is-your-refrigerator-running/
  • 5. Confidential and Proprietary to Daugherty Business Solutions • Great! You’ve got options – Hadoop on EC2 with a distribution – Hadoop on EMR with a distribution – Hadoop on EMR with Amazon’s Hadoop version Let’s say you wanted to do Big Data on AWS
  • 6. Confidential and Proprietary to Daugherty Business Solutions Hadoop 6 So what would that look like? API Client Flume Client Kafka Client Kafka Spark Streaming HBase Hive HDFS Mahout Spark MLIB Spark SQL SOLR API Client Flume Client Kafka Client NiFi
  • 7. Confidential and Proprietary to Daugherty Business Solutions • Virtual machines in the cloud • Choice of many different options – Operating system – Processors – Memory – Disk sizes • Can be created in minutes • Can be created through code • Can be turned off when not needed to reduce costs 7 EC2
  • 8. Confidential and Proprietary to Daugherty Business Solutions • All of these options require that you have a Hadoop administrator that can tweak the installation for performance. • Your servers generally need to be up and running, so you are paying for them even when they are not heavily utilized. 8 The downsides
  • 9. Confidential and Proprietary to Daugherty Business Solutions • You can use Amazon’s services to roll your own Big Data application 9 Or … http://www.writingfordesigners.com/?p=19906
  • 10. Confidential and Proprietary to Daugherty Business Solutions 10 Ingest AWS API Gateway Flume Client AWS IoT Lambda Lambda Lambda AWS Greengrass
  • 11. Confidential and Proprietary to Daugherty Business Solutions • Three step process to set up an API – Define the API – Create the client – Create the server • Wizard to help define the API • Connects to Lambda, DynamoDB, EC2, S3 11 API Gateway API Gateway API Client
  • 12. Confidential and Proprietary to Daugherty Business Solutions • Serverless code execution • No servers to provision or manage • Event trigger based • You pay only for code execution time • Automatic scaling up to user defined thresholds • Currently only a few languages supported (Node.js, Java, Python, and C#) 12 Lambda Lambda
  • 13. Confidential and Proprietary to Daugherty Business Solutions • Device Gateway • Message Broker • Rules Engine • Security and Identity Service • Thing Registry • Thing Shadow • Thing Shadows Service • Integrations with other AWS components • Processing SDK • Device SDK 13 AWS IoT AWS IoT
  • 14. Confidential and Proprietary to Daugherty Business Solutions • Extends the functions of AWS IoT to intermittently connected devices • Devices connect to a local Greengrass core • Core connects to server when connection is present 14 AWS Greengrass NiFi AWS Greengrass
  • 15. Confidential and Proprietary to Daugherty Business Solutions AWS Lambda Lambda Lambda 15 Processing data in real-time Lambda Kinesis Lambda SQS API Gateway AWS IoT Flume Client
  • 16. Confidential and Proprietary to Daugherty Business Solutions • Publish/subscribe messaging service - topics • Dynamically resize consumer/publisher bandwidth • Cleans up after itself after 24 hours 16 Kinesis Kinesis Kafka
  • 17. Confidential and Proprietary to Daugherty Business Solutions • Queue based service • Destructive reads 17 Simple Queue Service (SQS) Standard Queue FIFO Queue High throughput Limited throughput (300 TPS) At-Least-Once Delivery Exactly-Once Processing Best-Effort Ordering First-In-First-Out Delivery SQS Spark Streaming
  • 18. Confidential and Proprietary to Daugherty Business Solutions • Scheduled batch operations • WYSIWYG editor 18 Data Pipeline
  • 19. Confidential and Proprietary to Daugherty Business Solutions 19 Storing Data AWS API Client Flume Client Kafka Client Lambda Lambda Lambda Lambda Kinesis Lambda SQS API Gateway Flume Client AWS IoT S3 RDS Dynamo DB
  • 20. Confidential and Proprietary to Daugherty Business Solutions • File storage in the cloud • Store file backups offsite • Host static websites • Highly available – 99.99% • Highly durable – 99.999999999% • Versioning can be turned on 20 S3 S3 HDFS
  • 21. Confidential and Proprietary to Daugherty Business Solutions • Create a cloud-based RDBMS – Amazon Aurora – MySQL – MariaDB – PostgreSQL – Oracle – SQL Server • Costs based on type of engine, size of database, storage 21 RDS RDS Hive
  • 22. Confidential and Proprietary to Daugherty Business Solutions • NoSQL Document Store • Handles sparse data • Pay for Read/Write Capacity and Storage 22 DynamoDB Dynamo DB HBase
  • 23. Confidential and Proprietary to Daugherty Business Solutions 23 Analyzing Data AWS API Client Flume Client Kafka Client Lambda Lambda Lambda Lambda Kinesis Lambda SQS API Gateway Flume Client AWS IoT S3 RDS Dynamo DB Athena Redshift Machine Learning
  • 24. Confidential and Proprietary to Daugherty Business Solutions Athena has a limited set of formats that it works with: • Apache Web Logs • CSV • TSV • Text File with Custom Delimiters • JSON • Parquet • ORC Advantages • Serverless • Scalable 24 Athena Athena Hive
  • 25. Confidential and Proprietary to Daugherty Business Solutions • PostgreSQL compatible syntax with columnar storage • Designed for DWH/OLAP queries • Integrates with DynamoDB, S3, and Data Pipeline • Tunable concurrency limits 25 Redshift Redshift Spark SQL
  • 26. Confidential and Proprietary to Daugherty Business Solutions • Offers three types of machine learning models: – Binary Classification – Multiclass Classification – Regression • Offers batch or synchronous modes 26 Machine Learning Machine Learning Mahout Spark MLIB
  • 27. Confidential and Proprietary to Daugherty Business Solutions AWS API Client Flume Client Kafka Client Lambda Lambda Lambda Lambda Kinesis Lambda SQS API Gateway Flume Client AWS IoT S3 RDS Dynamo DB Athena Redshift Machine Learning 27 Search Elastic Search
  • 28. Confidential and Proprietary to Daugherty Business Solutions • Amazon’s implementation of Elastic’s ElasticSearch product • Distributed JSON-based search analytics engine • Designed for Horizontal scalability, reliability, and easy management • Combined with Logstash and Kibana to form the ELK stack 28 Elastic Search Elastic Search SOLR
  • 29. Confidential and Proprietary to Daugherty Business Solutions • Scalability • Fault-tolerance • Security • Cost 29 Other concerns
  • 30. Confidential and Proprietary to Daugherty Business Solutions • Adding more resources to AWS clusters can be done at the click of a button. • Most AWS services allow for additional resources to be added. Some allow for autoscaling. • Autoscaling can be used to limit the cost of cluster operation. 30 Scalability
  • 31. Confidential and Proprietary to Daugherty Business Solutions • AWS services are designed to be self-healing. • The underlying data store for most applications is S3. 31 Fault-Tolerance
  • 32. Confidential and Proprietary to Daugherty Business Solutions • Security for all of your cluster resources is managed by IAM (Identity and Access Management). • Policies can be set for each resource with fine-grained access control. • Arguably, this is one area where having a skilled administrator can be a great help. 32 Security
  • 33. Confidential and Proprietary to Daugherty Business Solutions • You can perform cost calculations before using any services • You only pay for what you use (no contracts!) • But, you will get a better price if you used Reserved Instances (annual or multi-year contracts) • You can easily tie infrastructure costs to a product or department • There is a free tier that can be used for a year • You just need a credit card to get started 33 Cost
  • 34. Confidential and Proprietary to Daugherty Business Solutions 34 And more
  • 35. Confidential and Proprietary to Daugherty Business Solutions Join Our Team Contact: Adam.doyle@daugherty.com