Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Jesper Söderlund, Manager Solutions Architecture...
Agenda
Big Data Challenges
Architecture principles
What technologies should you use? How? Why?
Reference architecture
Desi...
Big Data Evolution
Batch
processing
Stream
processing
Artificial
Intelligence
Plethora of Tools
Amazon
Glacier
S3 DynamoDB
RDS
EMR
Amazon
Redshift
Data Pipeline
Amazon
Kinesis
Lambda Amazon ML
SQS
Ela...
Big Data Challenges
Why?
How?
What tools should I use?
Is there a reference architecture?
Architectural Principles
Build decoupled systems
• Data → Store → Process → Store → Analyze → Answers
Use the right tool f...
Simplify Big Data Processing
COLLECT STORE PROCESS/
ANALYZE
CONSUME
Time to answer (Latency)
Throughput
Cost
Types of DataCOLLECT
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
Applications
In-memory data structures
D...
What Is the Temperature of Your Data ?
Hot Warm Cold
Volume MB–GB GB–TB PB–EB
Item size B–KB KB–MB KB–TB
Latency ms ms, sec min, hrs
Durability Low–high High Ver...
Store
STORE
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
IoT
COLLECT
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS...
In-memory
Amazon Kinesis
Firehose
Amazon Kinesis
Streams
Apache Kafka
Amazon DynamoDB
Streams
Amazon SQS
Amazon SQS
• Mana...
Why Stream Storage?
Decouple producers & consumers
Persistent buffer
Collect multiple streams
Preserve client ordering
Par...
What About Amazon SQS?
• Decouple producers & consumers
• Persistent buffer
• Collect multiple streams
• No client orderin...
Which Stream/Message Storage Should I Use?
Amazon
DynamoDB
Streams
Amazon
Kinesis
Streams
Amazon
Kinesis
Firehose
Apache
K...
In-memory
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
Database
AWS Import/Export
Snowball
L...
In-memory
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS Database
AWS Import/Export
Snowball
L...
Anti-Pattern
Data Tier
Best Practice: Use the Right Tool for the Job
Data Tier
Search
Amazon Elasticsearch
Service
In-memory
Amazon ElastiCache
R...
Materialized Views & Immutable Log
Views
Immutable log
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
Cloud...
Which Data Store Should I Use?
Data structure → Fixed schema, JSON, key-value
Access patterns → Store data in the format y...
In-memory
SQL
Request rate
High Low
Cost/GB
High Low
Latency
Low High
Data volume
Low High
Amazon
Glacier
Structure
NoSQL
...
Cost-Conscious Design
Example: Should I use Amazon S3 or Amazon DynamoDB?
“I’m currently scoping out a project. The design...
Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per
month
300 2,048 1,483 777,600,000
Amazon S...
Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per
month
Scenario 1300 2,048 1,483 777,600,00...
PROCESS /
ANALYZE
Batch
Takes minutes to hours
Example: Daily/weekly/monthly reports
Amazon EMR (MapReduce, Hive, Pig, Spark)
Interactive
Ta...
What About ETL?
https://aws.amazon.com/big-data/partner-solutions/
ETLSTORE PROCESS / ANALYZE
Data Integration Partners
Re...
CONSUME
COLLECT STORE CONSUMEPROCESS / ANALYZE
Amazon Elasticsearch
Service
Apache Kafka
Amazon SQS
Amazon Kinesis
Streams
Amazon ...
STORE CONSUMEPROCESS / ANALYZE
Amazon QuickSight
Apps & Services
Analysis&visualization
Notebooks
IDEAPI
Applications & AP...
Putting It All Together
Streaming
Amazon Kinesis
Analytics
KCL
apps
AWS Lambda
COLLECT STORE CONSUMEPROCESS / ANALYZE
Amazon Elasticsearch
Service...
Design Patterns
Spark Streaming
Apache Storm
AWS Lambda
KCL apps
Amazon
Redshift
Amazon
Redshift
Hive
Spark
Presto
Amazon Kinesis Amazon
D...
Amazon EMR
Real-time Analytics
Amazon
Kinesis
KCL app
AWS Lambda
Spark
Streaming
Amazon
SNS
Amazon
AI
Notifications
Amazon...
Interactive
&
Batch
Analytics
Amazon S3
Amazon EMR
Hive
Pig
Spark
Amazon
AI
process
store
Consume
Amazon Redshift
Amazon E...
Interactive
&
Batch
Amazon S3
Amazon
Redshift
Amazon EMR
Presto
Hive
Pig
Spark
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
R...
The Move to a real-time data
architecture
Jarno Kartela
Chief Data Scientist, DNA Oy
EUR 859 million
Net sales in 2016
EUR 91 million
Operating result in 2016
TV
Finland’s largest cable operator and the
lead...
Introduction: our team’s
mission at DNA
SAD*
data
*Data you forced to collect
even though no-one wants it
as a customer & no-one needs
it in your business & no-one
We help DNA avoid SAD data &
create data & analytics assets
so good that they create fear in
our competition.
Data Status Quo
ETL
ETL
Analytics
ETL
Analytics
Great wall of China
“Online” Analytics
ETL
Analytics
Great wall of China
“Online” Analytics
Greater wall of China
Marketing?
ETL
Analytics
Great wall of China
“Online” Analytics
Greater wall of China
Marketing?
Brand stuff
Product dev
ETL
Analytics
Great wall of China
“Online” Analytics
Greater wall of China
Marketing?
Brand stuff
Product dev
“AI”
ETL
Analytics
Great wall of China
“Online” Analytics
Greater wall of China
Marketing?
Brand stuff
Product dev
Consultants
...
All hail AWS
Analytics
dept.
Customer
interfaces
Reporting
dept.
Real-time
data feed
Marketing
dept.Customer
service
dept.
Product
dept.
Analytics
dept.
Customer
interfaces
Reporting
dept.
Real-time
data feed
Marketing
dept.Customer
service
dept.
Product
dept...
Benefits
Product / infrastructure dev: potential scoring
Product / infrastructure dev: real-time insight
Marketing: channel-agnostic metrics
Marketing & CX: machine learning for
recommendations & personalization
AI: understanding & classifying natural
language (...and later on, for bots)
Customer service: 360 view & recommendations
Data as
customer
experience
Input
Output
Data as
customer
experience
Input
Output Better CX
Benefits
Customer
- unified experience
across channels
- personalized content
- better offers
- less time spent on
browsin...
Coming up:
Rekognition for retail analytics,
Lex/similar for voice dialogue
and chatbots,
GDPR -> create services
instead ...
Closing remarks
Summary
Build decoupled systems
• Use Amazon S3 as the data fabric of your data lake
• Data → Store → Process → Store → An...
Building a Data Lake on AWS
Kinesis Firehose
Athena
Query Service
Resources
• https://aws.amazon.com/blogs/big-data/introducing-the-data-
lake-solution-on-aws/
• AWS re:Invent 2016: Netfli...
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
Upcoming SlideShare
Loading in …5
×

Big Data Architectural Patterns and Best Practices

5,984 views

Published on

Find out more about the Architectural Patterns and Best Practices on Big Data.

Published in: Software
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Big Data, AI, ML: Strategy,Models and Capitalization https://www.slideshare.net/ishmelev/datamoney
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Big Data Architectural Patterns and Best Practices

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Jesper Söderlund, Manager Solutions Architecture 2017-05-03 Big Data Architectural Patterns and Best practices
  2. 2. Agenda Big Data Challenges Architecture principles What technologies should you use? How? Why? Reference architecture Design patterns Customer Story: The Move to real-time data architectures, DNA Oy
  3. 3. Big Data Evolution Batch processing Stream processing Artificial Intelligence
  4. 4. Plethora of Tools Amazon Glacier S3 DynamoDB RDS EMR Amazon Redshift Data Pipeline Amazon Kinesis Lambda Amazon ML SQS ElastiCache DynamoDB Streams Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight
  5. 5. Big Data Challenges Why? How? What tools should I use? Is there a reference architecture?
  6. 6. Architectural Principles Build decoupled systems • Data → Store → Process → Store → Analyze → Answers Use the right tool for the job • Data structure, latency, throughput, access patterns Leverage AWS managed services • Scalable/elastic, available, reliable, secure, no/low admin Use log-centric design patterns • Immutable logs, materialized views Be cost-conscious • Big data ≠ big cost
  7. 7. Simplify Big Data Processing COLLECT STORE PROCESS/ ANALYZE CONSUME Time to answer (Latency) Throughput Cost
  8. 8. Types of DataCOLLECT Mobile apps Web apps Data centers AWS Direct Connect RECORDS Applications In-memory data structures Database records AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES LoggingTransport Search documents Log files Messaging Message MESSAGES Messaging Messages Devices Sensors & IoT platforms AWS IoT STREAMS IoT Data streams Transactions Files Events
  9. 9. What Is the Temperature of Your Data ?
  10. 10. Hot Warm Cold Volume MB–GB GB–TB PB–EB Item size B–KB KB–MB KB–TB Latency ms ms, sec min, hrs Durability Low–high High Very high Request rate Very high High Low Cost/GB $$-$ $-¢¢ ¢ Hot data Warm data Cold data Data Characteristics: Hot, Warm, Cold
  11. 11. Store
  12. 12. STORE Devices Sensors & IoT platforms AWS IoT STREAMS IoT COLLECT AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES LoggingTransport Messaging Message MESSAGES MessagingApplications Mobile apps Web apps Data centers AWS Direct Connect RECORDS Types of Data Stores Database SQL & NoSQL databases Search Search engines File store File systems Queue Message queues Stream storage Pub/sub message queues In-memory Caches, data structure servers
  13. 13. In-memory Amazon Kinesis Firehose Amazon Kinesis Streams Apache Kafka Amazon DynamoDB Streams Amazon SQS Amazon SQS • Managed message queue service Apache Kafka • High throughput distributed streaming platform Amazon Kinesis Streams • Managed stream storage + processing Amazon Kinesis Firehose • Managed data delivery Amazon DynamoDB • Managed NoSQL database • Tables can be stream-enabled Message & Stream Storage Devices Sensors & IoT platforms AWS IoT STREAMS IoT COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS Database Applications AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Search File store LoggingTransport Messaging Message MESSAGES Messaging Message Stream
  14. 14. Why Stream Storage? Decouple producers & consumers Persistent buffer Collect multiple streams Preserve client ordering Parallel consumption Streaming MapReduce 443322114 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 44332211 shard 1 / partition 1 shard 2 / partition 2 Consumer 1 Count of red = 4 Count of violet = 4 Consumer 2 Count of blue = 4 Count of green = 4 DynamoDB stream Amazon Kinesis stream Kafka topic
  15. 15. What About Amazon SQS? • Decouple producers & consumers • Persistent buffer • Collect multiple streams • No client ordering (Standard) • FIFO queue preserves client ordering • No streaming MapReduce • No parallel consumption • Amazon SNS can publish to multiple SNS subscribers (queues or ʎ functions) Publisher Amazon SNS topic function ʎ AWS Lambda function Amazon SQS queue queue Subscriber Consumers 4 3 2 1 12344 3 2 1 1234 2134 13342 Standard FIFO
  16. 16. Which Stream/Message Storage Should I Use? Amazon DynamoDB Streams Amazon Kinesis Streams Amazon Kinesis Firehose Apache Kafka Amazon SQS (Standard) Amazon SQS (FIFO) AWS managed Yes Yes Yes No Yes Yes Guaranteed ordering Yes Yes No Yes No Yes Delivery (deduping) Exactly-once At-least-once At-least-once At-least-once At-least-once Exactly-once Data retention period 24 hours 7 days N/A Configurable 14 days 14 days Availability 3 AZ 3 AZ 3 AZ Configurable 3 AZ 3 AZ Scale / throughput No limit / ~ table IOPS No limit / ~ shards No limit / automatic No limit / ~ nodes No limits / automatic 300 TPS / queue Parallel consumption Yes Yes No Yes No No Stream MapReduce Yes Yes N/A Yes N/A N/A Row/object size 400 KB 1 MB Destination row/object size Configurable 256 KB 256 KB Cost Higher (table cost) Low Low Low (+admin) Low-medium Low-medium Hot Warm New
  17. 17. In-memory COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS Database AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Search Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Hot Stream Amazon S3 Amazon SQS Message Amazon S3 File LoggingIoTApplicationsTransportMessaging File Storage
  18. 18. In-memory COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS Database AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Search Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Hot Stream Amazon SQS Message Amazon S3 File LoggingIoTApplicationsTransportMessaging In-memory, Database, Search
  19. 19. Anti-Pattern Data Tier
  20. 20. Best Practice: Use the Right Tool for the Job Data Tier Search Amazon Elasticsearch Service In-memory Amazon ElastiCache Redis Memcached SQL Amazon Aurora Amazon RDS MySQL PostgreSQL Oracle SQL Server NoSQL Amazon DynamoDB Cassandra HBase MongoDB
  21. 21. Materialized Views & Immutable Log Views Immutable log
  22. 22. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Hot Stream Amazon SQS Message Amazon Elasticsearch Service Amazon DynamoDB Amazon S3 Amazon ElastiCache Amazon RDS SearchSQLNoSQLCacheFile LoggingIoTApplicationsTransportMessaging Amazon ElastiCache • Managed Memcached or Redis service Amazon DynamoDB • Managed NoSQL database service Amazon RDS • Managed relational database service Amazon Elasticsearch Service • Managed Elasticsearch service
  23. 23. Which Data Store Should I Use? Data structure → Fixed schema, JSON, key-value Access patterns → Store data in the format you will access it Data characteristics → Hot, warm, cold Cost → Right cost
  24. 24. In-memory SQL Request rate High Low Cost/GB High Low Latency Low High Data volume Low High Amazon Glacier Structure NoSQL Hot data Warm data Cold data Low High
  25. 25. Cost-Conscious Design Example: Should I use Amazon S3 or Amazon DynamoDB? “I’m currently scoping out a project. The design calls for many small files, perhaps up to a billion during peak. The total size would be on the order of 1.5 TB per month…” Request rate (Writes/sec) Object size (Bytes) Total size (GB/month) Objects per month 300 2048 1483 777,600,000
  26. 26. Request rate (Writes/sec) Object size (Bytes) Total size (GB/month) Objects per month 300 2,048 1,483 777,600,000 Amazon S3 or DynamoDB?
  27. 27. Request rate (Writes/sec) Object size (Bytes) Total size (GB/month) Objects per month Scenario 1300 2,048 1,483 777,600,000 Scenario 2300 32,768 23,730 777,600,000 Amazon S3 Amazon DynamoDB use use
  28. 28. PROCESS / ANALYZE
  29. 29. Batch Takes minutes to hours Example: Daily/weekly/monthly reports Amazon EMR (MapReduce, Hive, Pig, Spark) Interactive Takes seconds Example: Self-service dashboards Amazon Redshift, Amazon Athena, Amazon EMR (Presto, Spark) Message Takes milliseconds to seconds Example: Message processing Amazon SQS applications on Amazon EC2 Stream Takes milliseconds to seconds Example: Fraud alerts, 1 minute metrics Amazon EMR (Spark Streaming), Amazon Kinesis Analytics, KCL, Storm, AWS Lambda Artificial Intelligence Takes milliseconds to minutes Example: Fraud detection, forecast demand, text to speech Amazon AI (Lex, Polly, ML, Rekognition), Amazon EMR (Spark ML), Deep Learning AMI (MXNet, TensorFlow, Theano, Torch, CNTK and Caffe) Analytics Types & Frameworks PROCESS / ANALYZE Message Amazon SQS apps Amazon EC2 Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Stream Amazon EC2 Amazon EMR Fast Amazon Redshift Presto Amazon EMR FastSlow Amazon Athena BatchInteractive Amazon AI AI
  30. 30. What About ETL? https://aws.amazon.com/big-data/partner-solutions/ ETLSTORE PROCESS / ANALYZE Data Integration Partners Reduce the effort to move, cleanse, synchronize, manage, and automatize data related processes. AWS Glue AWS Glue is a fully managed ETL service that makes it easy to understand your data sources, prepare the data, and move it reliably between data stores New
  31. 31. CONSUME
  32. 32. COLLECT STORE CONSUMEPROCESS / ANALYZE Amazon Elasticsearch Service Apache Kafka Amazon SQS Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Amazon ElastiCache Amazon RDS Amazon DynamoDB Streams HotHotWarm FileMessage Stream Mobile apps Web apps Devices Messaging Message Sensors & IoT platforms AWS IoT Data centers AWS Direct Connect AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS DOCUMENTS FILES MESSAGES STREAMS LoggingIoTApplicationsTransportMessaging ETL SearchSQLNoSQLCache Streaming Amazon Kinesis Analytics KCL apps AWS Lambda Fast Stream Amazon EC2 Amazon EMR Amazon SQS apps Amazon Redshift Presto Amazon EMR FastSlow Amazon EC2 Amazon Athena BatchMessageInteractiveAI Amazon AI Amazon S3
  33. 33. STORE CONSUMEPROCESS / ANALYZE Amazon QuickSight Apps & Services Analysis&visualization Notebooks IDEAPI Applications & API Analysis and visualization Notebooks IDE Business users Data scientist, developers COLLECT ETL
  34. 34. Putting It All Together
  35. 35. Streaming Amazon Kinesis Analytics KCL apps AWS Lambda COLLECT STORE CONSUMEPROCESS / ANALYZE Amazon Elasticsearch Service Apache Kafka Amazon SQS Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Amazon ElastiCache Amazon RDS Amazon DynamoDB Streams HotHotWarm Fast Stream SearchSQLNoSQLCacheFileMessageStream Amazon EC2 Mobile apps Web apps Devices Messaging Message Sensors & IoT platforms AWS IoT Data centers AWS Direct Connect AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS DOCUMENTS FILES MESSAGES STREAMS Amazon QuickSight Apps & Services Analysis&visualizationNotebooksIDEAPI LoggingIoTApplicationsTransportMessaging ETL Amazon EMR Amazon SQS apps Amazon Redshift Presto Amazon EMR FastSlow Amazon EC2 Amazon Athena BatchMessageInteractiveAI Amazon AI Amazon S3
  36. 36. Design Patterns
  37. 37. Spark Streaming Apache Storm AWS Lambda KCL apps Amazon Redshift Amazon Redshift Hive Spark Presto Amazon Kinesis Amazon DynamoDB Amazon S3data Hot Cold Data temperature Processingspeed Fast Slow Answers Hive Native apps KCL apps AWS Lambda Amazon Athena
  38. 38. Amazon EMR Real-time Analytics Amazon Kinesis KCL app AWS Lambda Spark Streaming Amazon SNS Amazon AI Notifications Amazon ElastiCache (Redis) Amazon DynamoDB Amazon RDS Amazon ES Alerts App state or Materialized View Real-time prediction KPI process store Amazon Kinesis Analytics Amazon S3 Log Amazon KinesisFan out
  39. 39. Interactive & Batch Analytics Amazon S3 Amazon EMR Hive Pig Spark Amazon AI process store Consume Amazon Redshift Amazon EMR Presto Spark Batch Interactive Batch prediction Real-time prediction Amazon Kinesis Firehose Amazon Athena Files Amazon Kinesis Analytics
  40. 40. Interactive & Batch Amazon S3 Amazon Redshift Amazon EMR Presto Hive Pig Spark Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon ES AWS Lambda Storm Spark Streaming on Amazon EMR Applications Amazon Kinesis App state or Materialized View KCL Amazon AI Real-time Amazon DynamoDB Amazon RDS Change Data Capture Transactions Stream Files Data Lake Amazon Kinesis Analytics Amazon Athena Amazon Kinesis Firehose
  41. 41. The Move to a real-time data architecture Jarno Kartela Chief Data Scientist, DNA Oy
  42. 42. EUR 859 million Net sales in 2016 EUR 91 million Operating result in 2016 TV Finland’s largest cable operator and the leading pay TV provider Mobile communications and fixed network customer subscriptions 3.8 millionOUR VALUES FAST DNA's customers receive quick and helpful service STRAIGHTFORWARD DNA’s approach is clear and responsible BOLD We are direct, open-minded and ready for change The personnel's satisfaction with DNA as an employer is at a record-breaking high level Strong employee satisfaction DNA is one of the leading Finnish telecommunications groups 42 1,668 At the end of 2016, there were 1,668 employees working with DNA Finland’s most extensive retailer of mobile phones, other mobile devices and mobile subscriptions 64 DNA stores Customer is in the center of DNA’s strategy Public | DNA Today
  43. 43. Introduction: our team’s mission at DNA
  44. 44. SAD* data *Data you forced to collect even though no-one wants it as a customer & no-one needs it in your business & no-one
  45. 45. We help DNA avoid SAD data & create data & analytics assets so good that they create fear in our competition.
  46. 46. Data Status Quo
  47. 47. ETL
  48. 48. ETL Analytics
  49. 49. ETL Analytics Great wall of China “Online” Analytics
  50. 50. ETL Analytics Great wall of China “Online” Analytics Greater wall of China Marketing?
  51. 51. ETL Analytics Great wall of China “Online” Analytics Greater wall of China Marketing? Brand stuff Product dev
  52. 52. ETL Analytics Great wall of China “Online” Analytics Greater wall of China Marketing? Brand stuff Product dev “AI”
  53. 53. ETL Analytics Great wall of China “Online” Analytics Greater wall of China Marketing? Brand stuff Product dev Consultants “AI”
  54. 54. All hail AWS
  55. 55. Analytics dept. Customer interfaces Reporting dept. Real-time data feed Marketing dept.Customer service dept. Product dept.
  56. 56. Analytics dept. Customer interfaces Reporting dept. Real-time data feed Marketing dept.Customer service dept. Product dept. Data as customer experience
  57. 57. Benefits
  58. 58. Product / infrastructure dev: potential scoring
  59. 59. Product / infrastructure dev: real-time insight
  60. 60. Marketing: channel-agnostic metrics
  61. 61. Marketing & CX: machine learning for recommendations & personalization
  62. 62. AI: understanding & classifying natural language (...and later on, for bots)
  63. 63. Customer service: 360 view & recommendations
  64. 64. Data as customer experience Input Output
  65. 65. Data as customer experience Input Output Better CX
  66. 66. Benefits Customer - unified experience across channels - personalized content - better offers - less time spent on browsing DNA services - overall better CX - coming: my data & ability to change the data about you as a customer Business - time spent on marketing 10x less than before - sales up 3x across AI- driven campaigns - sales up 2x-10x across campaigns triggered by real-time data - near real time (5 min) insight on what’s happening - insight across channels IT / Analytics - better service quality with less time & resources - devops, automation - endless scale for analytics & data - ease of try-fail-try- again - cost effectiveness - one source of truth for data - ability to serve all channels & functions with one platform - 6 months to full scale production
  67. 67. Coming up: Rekognition for retail analytics, Lex/similar for voice dialogue and chatbots, GDPR -> create services instead of fear
  68. 68. Closing remarks
  69. 69. Summary Build decoupled systems • Use Amazon S3 as the data fabric of your data lake • Data → Store → Process → Store → Analyze → Answers Use the right tool for the job • Data structure, latency, throughput, access patterns Leverage AWS managed services • Scalable/elastic, available, reliable, secure, no/low admin Use log-centric design patterns • Immutable log, batch, interactive & real-time views Be cost-conscious • Big data ≠ big cost
  70. 70. Building a Data Lake on AWS Kinesis Firehose Athena Query Service
  71. 71. Resources • https://aws.amazon.com/blogs/big-data/introducing-the-data- lake-solution-on-aws/ • AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ecosystem (BDM306) • AWS re:Invent 2016: Deep Dive on Amazon S3 (STG303) • https://aws.amazon.com/blogs/big-data/reinvent-2016-aws-big- data-machine-learning-sessions/ • https://aws.amazon.com/blogs/big-data/implementing- authorization-and-auditing-using-apache-ranger-on-amazon-emr/

×