Near Real-Time Data Analysis With FlyData

Near Real-Time Data Analysis
With FlyData
Move your data on the fly!

FlyData
Cloud based big data integration
www.flydata.com

Difficulty of loading data to Redshift
~Difference between traditional DBs and Redshift~
MySQL, PostgreSQL, Oracle, etc.
Amazon Redshift
Transactional
RDB
Data
warehouse
SQL
INSERT
Bulk upload, but
how?
synchronous
asynchronous
www.flydata.com

MySQL, PostgreSQL, Oracle, etc.
Amazon Redshift
SQL
INSERT
FlyData
Transactional
RDB
Data
warehouse
synchronous
asynchronous
Difficulty of loading data to Redshift
~Difference between traditional DBs and Redshift~
www.flydata.com

Process of upload to Redshift
1. Data extraction (E)
2. Transform data (T)
3. Upload TSV file to S3
4. Run COPY command to
load data from S3 to
Redshift (L)
5. Error Handling
Amazon
Redshift
S3TSV
Data Extraction
(E) and
Transform (T)
Client
Server
Log Files
Load to
DB(L)
Error
Handling
www.flydata.com

FlyData: Near-Real Time Upload To
Redshift
Manage all with
FlyData
Amazon
RedshiftClient
Server
www.flydata.com

7
Amazon Redshift
Client
Server
FlyData Client
FlyData Architecture
logs
FlyData Cloud
ELB
FD Data Server
S3
Data
Stats
www.flydata.com

FlyData Features
www.flydata.com

FlyData – A service for Amazon Redshift
1. Continuous Loading
2. Flexible JSON format Support
3. Query Scheduling and Management
4. All-in-One package for Amazon Redshift
www.flydata.com

Continuous Loading
• Near Real-time Data: Send data to Redshift
periodically, every 5 minutes
• Scaling. FlyData can handle large amounts of data
(100GB+ per day) for many tables, while optimizing
appropriately with scheduled COPY commands
• Error handling.
– Retry and notifications.
– Even when Redshift is in its
maintenance window
www.flydata.com

Nested JSON and Apache Log
Formats
• Support for Nested JSON logs and Apache log
formats, not yet offered by AWS
• Dynamic Column Creation
– Brings flexibility to tables
– Less need to predefine table schema
• Smooth handling of nested data
– Auto-creation of parent-child table relationships
www.flydata.com

Example of auto-creating tables from
JSON Logs
Your JSON logs:
Get stored in RS as:
www.flydata.com

Flexible JSON format Support
• Your JSON log can be loaded into Redshift
directly!
• Automatic creation of tables and columns for
Redshift from your JSON log
• Nested JSON support
– Handles structure by creating
parent-child table relations
with foreign keys
www.flydata.com

Query Scheduling and Management
• Stored SQL management on web console
• Mail notifications and downloads for queries
that take a long time to run
• Periodical query scheduling
(under development)
– Time scheduled query processing
– Running maintenance tasks
www.flydata.com

All in One package for Amazon
Redshift
• We are an Amazon Redshift partner
– Officially listed on
https://aws.amazon.com/redshift/partners/
• Complete technical support for FlyData & Redshift
• As a Reseller Partner,
we can provide Amazon
Redshift under a flexible
pricing schedule
www.flydata.com

FlyData Sync
• Released in January 2014
• Enables Synchronization between RDBMS to
Redshift. (Currently supporting MySQL)
• Just another feature of FlyData for Redshift
– Easy setup through web/command-line interface
– One-line install command
• Supporting Insert / Delete / Update statements
www.flydata.com

18
Amazon Redshift
Customer Data Center or Cloud
FlyData Client
Replication
binlog access
binlog access
Read Replica
is Optional
scalable
data servers
Amazon S3
Load Controller
Load
Optimization
for Redshift
FlyData Sync for MySQL
www.flydata.com

FlyData Sync Requirements
• Support currently limited to MySQL
• FlyData module must be installed on a data server with
access to MySQL transaction logs
• Supported MySQL DB Engines: InnoDB and MyISAM
• Transaction log format: ROW
– --binlog-format=ROW
• Synced table must have Primary Key set
• For data types not supported on Redshift:
– MySQL’s "binary”,"varbinary” switched to “VARCHAR”, etc.
www.flydata.com

Use Case: Game Analytics
• Multi-platform game titles
FlyData client module makes it easy to manage
• Basic Log Format: JSON
Makes analytics flexible and reduces data
• Large amounts of data in popular titles (200GB / day)
– Large amounts of data are concentrated in a specific table
– Hard to load in real-time （due to Redshift restrictions)
FlyData can handle it!
www.flydata.com

Contact Information
• sales@flydata.com
• Toll Free: 1-855-427-9787
• http://flydata.com
We are an official data
integration partner of
Amazon Redshift
www.flydata.com

FlyData Autoload:
Use Cases
Move your data on the fly!
www.flydata.com

Real-time analytics for gaming client
• Case
– Client is a leading mobile gaming company in Japan with multiple released
game titles
– Previously large amount of data was stored MySQL cluster
– MySQL often went down because of the large amount of data. Repair took
weeks of man-hours every time this happened.
– Historical analysis over multiple years was simply impossible, given the data
size.
• Solution
– Implemented FlyData Enterprise with JSON logs across multiple titles
– Outputs user activity by application into JSON log files
– Data is automatically fed to Amazon Redshift
• Result
– Engineering time is saved and real-time BI insights can be fed back to
application development cycle
– Client saves 2 weeks of man-hours every month, with added insight into user
behavior. As a result, the client continues to steadily grow its user base and its
bottom line.
www.flydata.com

Data analytics on Online Ad
Effectiveness
• Case
– Client is a online advertisement startup in the US with Display Ads shown across multiple websites
– User activity from the duration of engagement to the position of the cursor is all logged to measure
viewer engagement
– Client needs to save large amounts of data, and be able to query that data real-time. This data will then
be used to generate Ad Performance Reports.
– Their initial option Hadoop turned out to be too costly in terms of Engineering time. The learning curve
for the team was steep, for both query generation and maintenance of their Hadoop clusters
• Solution
– Implemented FlyData Enterprise using “Extended” Apache logs
– Outputs all user activity in Apache logs with additional information appended, such as key-value pair
information for URL parameters and custom variables
– Data is automatically fed to Amazon Redshift in the appropriate columns. When appropriate columns
do not exist, the columns are added on the fly. This allows for added flexibility in table schema design
– Customer can now know the real-time effectiveness of their online advertisements through Ad
Performance Reports
– The client’s internal BI team can quickly analyze which ads are working and which are not,
in real-time and can gain insight or optimize for the best performing ads
• Result
– With a more cost-effective solution than Hadoop, client was able to increase revenue by steadily
increasing the quality of ads based on data gathered by FlyData and analyzed in Amazon Redshift.
– Client has an implemented scalable backend reporting system that can handle multi-TB sized ad
campaigns.
www.flydata.com

Faster Feedback, Faster Development
Cycles
• Case
– Client is a digital media startup in the US that has a website with rapid growth in user
access, becoming one of the most “Like”d pages on Facebook 1000万を超える
– User activity logs are carefully analyzed and assessed both for the website content and
for the user experience
– Used log data to perform funnel analysis on customer conversion rates
– Client received user activity from its site as JSON objects, before storing it in MongoDB
– Given the nature of the queries they wanted to run, MongoDB became very slow as their
user base grew
• Solution
– Implemented FlyData Enterprise using nested JSON logs
– Outputs all user activity as a JSON log file
– FlyData automatically uploads the data into Redshift, so BI team (= App Development
team) can simply query their user activity logs
– Client now can quickly perform funnel analysis on customer data
• Result
– Query speed dramatically improved. Queries that took 20 minutes before, now take less
than a minute, while still being able to have the flexibility of JSON.
– Faster development cycles (Build-Measure-Learn cycles) were achieved.
www.flydata.com

www.flydata.com www.flydata.com
Check us out!
-> http://flydata.com
sales@flydata.com
Toll Free: 1-855-427-9787
http://flydata.com
We are an official data integration
partner of Amazon Redshift

Near Real-Time Data Analysis With FlyData

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Near Real-Time Data Analysis With FlyData

Similar to Near Real-Time Data Analysis With FlyData (20)

More from FlyData Inc.

More from FlyData Inc. (11)

Recently uploaded

Recently uploaded (20)

Near Real-Time Data Analysis With FlyData