2. Outline
Introduction
Solution Architecture
- Storage
- Serverless Database
- Application Server
- BI/Reporting Service
Authentication & Authorization
Notes and Tips about Billing
Build and Demo
Building Your Instance Tips
Compliance - Data Privacy
Questions
References
6. Storage
➔ AWS S3 Buckets used as storage service
➔ S3 is an object storage to store/retrieve data
➔ Reduces effort to maintain your storage
➔ Highly scalable and you only pay what you use
➔ Key based object storage. Folder structure is virtual
7. Serverless Database
➔ AWS Athena used as serverless database services for analytics purposes
➔ Athena uses :
◆ Hive to create, drop, and alter tables and partitions
◆ Presto, a distributed SQL engine to run queries
➔ Helps as an interactive query service to analyze data in S3
➔ Able to connect data files including CSV, JSON, Avro or columnar data formats such
as Apache Parquet and ORC
➔ Data is not loaded in Athena. It just provides service to connect data
9. Application Server
➔ Linux Ubuntu 18.04 server running on an EC2 Instance
➔ Instance is preconfigured by Redash as Machine Instance
➔ T2.small size instance needed at least to run Redash
10. BI / Reporting Service
➔ Lightweight and generic open source reporting and data visualization tool
➔ Variety of hosting
◆ Open source base self hosted (Linux machine or docker)
◆ Open source base Cloud hosted (Amazon AMI)
◆ Google Cloud hosted
◆ Redash hosted
➔ Variety of supported data integrations
13. Authentication & Authorization
➔ To authenticate users, generic users are created in redash
LDAP authentication is also possible
➔ AWS Identity Access Management service used for authorization
➔ To access from redash to Athena and S3, user needs to be authorized with
policies
➔ IAM has policies and roles, which can be assigned to users and groups
14. Authentication & Authorization / 2
Redash uses API key generated by IAM.
In IAM, these users/groups have policy to access Athena and S3
Use API
Key
Assign
Policy
Access
IAM
16. Notes and Tips About Billing
➔ Elastic nature of EC2 services might cost a bit more than you think
◆ Try to stay in AWS Free Tier as much as possible
Redash requires t2.small instance, which is out of free tier unfortunately
◆ Remember to keep an eye on Billing & Cost Management Dashboard
◆ Create your AWS Budget and create your budget reports to see your expenses
◆ Do not forget to stop instances if you do not use
17. Build and
Demo
AWS S3 + Athena + Redash
To start open source AWS based setup, let’s
follow steps in
https://redash.io/help/open-source/setup#aws
18. Building Instance Tips
➔ In first time to create instance, save your server Amazon private key securely
➔ Convert amazon .pem private key to ppk by PuttyGen to use key in Putty
➔ Design your inbound/outbound security rules carefully, create your own restricted IAM policies
➔ Make sure you select T2.Small to be able to run Redash
➔ In Europe, Athena is only available in eu-west-1 and eu-west-2 regions
➔ If you plan to reuse a stopped EBS backed instance, do not terminate
➔ Make sure you have set your S3 bucket access policy well
e.g no public buckets, no full S3 access
➔ For advanced data security, evaluate using data encryption in your S3 buckets