Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics: Novus, DigitalOcean, Akamai.
Building Predictive Applications with Real-Time Data Pipelines and Streamliner. Eric Frenkiel, CEO and Co-Founder, MemSQL
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
Unlocking Real-Time Analytics and Predictive Applications with MemSQL Streamliner
1. End User Panel on Real-Time Data Analytics
Building Predictive Applications with
Real-Time Data Pipelines and Streamliner
Eric Frenkiel, CEO and Co-Founder, MemSQL
2. Going Real-Time is the Next Phase for Big Data
More
Devices
More
Interconnectivity
More
User Demand
…and companies are at risk of being left behind
3. MemSQL Architecture
St ream in g Da ta W areh o u se
Streaming
Integrated streaming
with Streamliner
Database
High volume transactions
for structured and
unstructured data
Data Warehouse
Fast, scalable
SQL for immediate
analytics
4. Applications and Technology Trends
Real-Time Analytics Risk-Management Personalization
Portfolio Tracking
Monitoring and
Detection
Internet of Things | Real-Time Data Pipelines | Operationalizing Apache Spark
6. Changing the Way the World Invests
Noah Zucker, Vice President – Tactical Engineering, Novus Partners
Scalable Portfolio Intelligence with MemSQL
7. 100+ Investment Managers, $2 Trillion AUM
Research Platform: 10,000+ Institutions
Founded 2007, Privately Held
We help investors discover their true
investment acumen and risk
About Novus
10. 24/7 ETL Handholding
Overnight Failure =
Business Hours Slowdown
Scala worker pool limited
by the database
Non-trivial code changes
needed to shard and scale
Before MemSQL…
15. Ian Hansen, Software Engineering Manager
Digital Ocean
ETL Tools for Small Teams
16. Problem: Business Intelligence Slows as We Grow
Data lives in SQL
Easy to ask new questions in SQL
But… Business Intelligence tasks taking longer
Database isn’t built for quick aggregations
17. Solution: Scale-out SQL Database
SQL team stays powerful
Quick to iterate with quick answers
Prepare for the future!
18. Problem: Data isn’t in MemSQL
Plus
You don’t have an engineer on
your team
It’s hard to get an engineer’s time
You’ve got a job to do…
(which is taking more and more
time)
19. Solution: ETL Using REPLACE INTO
MySQL SQL flavor (available in MemSQL)
Handles new rows and updates on rows
Easy to write
• Query source database then replace into target database
Many other scale-out SQL databases don’t have
equivalent
20. Problem: Now Load JSON Event Data
~300K events per day
Many different types of JSON events
21. Solution: MemSQL Loader + JSON Type
Only loads new files (or files
whose content has changed)
Parallelizes the process
Transformation script
simple: return id and raw json data
SQL team unaffected by new
JSON events
./memsql-loader load /opt/events/**
--table events
--script=/opt/events-etl
--file-id-column file_id
--columns id,data
22. Problem: Processing Data on Select
Need computed value in SQL query
Computing the value slows down queries
Computed value used on many queries
• e.g. domain from a URL string
23. Solution: Persistent Columns
Pre-compute result and
save it on the row
Automatically updated if
row changes
No need to alter ETL
pipeline
ALTER TABLE events
ADD COLUMN (
referring_domain AS
substring_index(substring(data::$re
ferrer, (locate('//',
data::$referrer)) + 2), '/', 1)
PERSISTED varchar(255)
)
24. Solution: Persistent Columns
Use pre-computed value in select
memsql> select data, referring_domain from events limit 2;
+-------------------------------------+------------------+
| data | referring_domain |
+-------------------------------------+------------------+
| {"referrer":"http://example.com/b"} | example.com |
| {"referrer":"http://example.com/a"} | example.com |
+-------------------------------------+------------------+
25. Tools
REPLACE INTO syntax
JSON native type
MemSQL Loader
Persistent columns
Now, MemSQL Streamliner
28. Mike DePrizio, Senior Architect, Akamai Technologies
Unlocking Revenue with In-Memory Technology
29. We are the leading provider of
cloud services for delivering,
optimizing and securing online
content and business applications
$1.96B
Revenue
1,300
Locations
5,000+
Customers
5,100+
Employees
CORPORATE STATS (2014):
OUR HISTORY:
Founded 1998 and rooted in MIT
technology—solving Internet
congestion with math not hardware
30. The Business of Billing
Billing domino effect
Akamai Customers Sub-customers
Daily billing requires:
Fast data delivery
Accurate data
Old Model New Model
Generating a bill at end of month for
customer services
Generating a bill at the end of every
day for sub-customer services
31. Current Billing Data Management
Gather logs from 190,000+ servers in 1400 locations in 110
countries
Multiple PBs/day aggregate/reduce into relevant billing data feed
Typical data record: 3 key fields plus metrics
Load resulting data record into our RDBMS system
32. Greatest Challenges
Current system cannot handle expected throughput
Difficult to quickly scale up existing environments
New model will generate 10x+ data
33. Deploying MemSQL
Application
Daily Sub-customer billing
Problem
Existing RDMS pipeline loads were maxed out at 150-
300K upserts/second, could not keep up with projected
size of new billing model
Results
MemSQL cluster performs at 1.9
million upserts/second, allowing
transition from monthly to daily billing
Billing Data resource
usage statistics
INSERT... ON
DUPLICATE KEY
UPDATE...
(1.9 million/sec)
Billing Application
• Compute sub-customer
charges daily
• Roll up sub-customer usage by
customer/cloud provider
• More sophisticated platform
offers customers better
service, partners new business
opportunities
34. Results Speak for Themselves
2M upserts/second on AWS EC2
instances
Scalability on commodity hardware
Meeting our billing windows
Unlocking revenue
35. Adapt PoC for real-world
situations
Continue scaling linearly
Optimize results with small
cluster deployment
What Next?
36. Eric Frenkiel, MemSQL CEO and co-founder
September 30, 2015 • New York, NY
Introducing MemSQL Streamliner
37. One click deployment of
integrated Apache Spark
Put Spark in the Fast Lane
• GUI pipeline setup
• Multiple data pipelines
• Real-time transformation
Eliminates batch ETL
Open source on GitHub
Introducing the MemSQL Streamliner
42. Streamliner Architecture
First of many integrated Apache Spark solutions
Other
Real-Time Data
Sources Application
Apache Spark
Future Solution
Future Machine
Learning Solution
STREAMLINER
45. Streamliner Benefits
Build end-to-end data pipelines in minutes
Reduce data latency from days or hours to ZERO
Support thousands of concurrent users running real-time
queries
Give users immediate access to fresh data via innovative
applications