"Building Real-Time Data Pipelines with Kafka and MemSQL" by Rick Negrin, Director of Product Management at MemSQL for Orange County Roadshow March 17, 2017.
"Building Real-Time Data Pipelines with Kafka and MemSQL" by Rick Negrin, Director of Product Management at MemSQL for Orange County Roadshow March 17, 2017.
Machines and the Magic of Fast LearningSingleStore
Human-machine interaction is no longer the exclusive province of science fiction. The advance of the internet and connected devices has inspired data scientists to create machine-learning applications to extract value from these new forms of data.
So what's the next frontier?
Join MemSQL Engineer Michael Andrews and Sr. Director Mike Boyarski to learn how to use real-time data as a vehicle for operationalizing machine-learning models. Michael and Mike will explore advanced tools, including TensorFlow, Apache Spark, and Apache Kafka, and compelling use cases demonstrating the power of machine learning to effect positive change.
You will learn:
Top technologies for building the ideal machine-learning stack
How to power machine-learning applications with real-time data
A use case and demo of machine learning for social good
Building the Ideal Stack for Machine LearningSingleStore
Machine Learning is not new, but its application across memory-optimized distributed systems has led to an explosion in both the number and capability of its uses. Pandora develops personalized content recommendations with machine learning algorithms, Tesla has produced the first widely distributed autonomous vehicle, and Amazon uses autonomous robots to move packages within its warehouses and even deliver packages. When coupled with real-time data, advanced analytics approaches like machine learning and deep learning create immediate business opportunities.
Machine learning has never been more accessible—if your data pipelines support real-time analysis. Attendees will learn tools and techniques for integrating machine learning models across industries and organizations. Steven Camiña, MemSQL Product Manager, will walk through critical technologies needed in your technology ecosystem, including Python, Apache Kafka, Apache Spark, and a real-time database.
CTO View: Driving the On-Demand Economy with Predictive AnalyticsSingleStore
In the on-demand economy real-time analytics is both a necessity and a competitive advantage. The next evolution in the on-demand economy is in predictive analytics fueled by live streams of data—in effect knowing what customers want before they do. This session will feature technical examples of real-time pipelines, machine learning, and custom dashboards as well as off-the-shelf dashboards with Tableau.
Operationalizing Machine Learning at Scale at StarbucksDatabricks
As ML-driven innovations are propelled by the Self-Service capabilities in the Enterprise Data and Analytics Platform, teams face a significant entry barrier and productivity issues in moving from POCs to Operating ML-powered apps at scale in production.
Winning the On-Demand Economy with Spark and Predictive AnalyticsSingleStore
Today’s on-demand economy drives companies to provide fast load times, personalization, and instantaneous service for hungry end-users across all types of applications. Yet most still use dated, legacy systems to process and analyze data. In this session, Ankur Goyal, VP of Engineering at MemSQL will showcase implementing a one-click Lambda Architecture with Apache Spark, Apache Kafka and an operational database, resulting in lightning fast analytics on large, changing datasets.
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingSingleStore
Robin Li, Director of Data Engineering and Yohan Chin, VP Data Science at Tapjoy share how to architect the best application experience for mobile users using technologies including Apache Kafka, Apache Spark, and MemSQL.
Speaker: Robin Li - Director of Data Engineering, Tapjoy and Yohan Chin - VP Data Science, Tapjoy
Building the Next-gen Digital Meter Platform for FluviusDatabricks
Fluvius is the network operator for electricity and gas in Flanders, Belgium. Their goal is to modernize the way people look at energy consumption using a digital meter that captures consumption and injection data from any electrical installation in Flanders ranging from households to large companies. After full roll-out there will be roughly 7 million digital meters active in Flanders collecting up to terabytes of data per day. Combine this with regulation that Fluvius has to maintain a record of these reading for at least 3 years, we are talking petabyte scale. delaware BeLux was assigned by Fluvius to setup a modern data platform and did so on Azure using Databricks as the core component to collect, store, process and serve these volumes of data to every single consumer and beyond in Flanders. This enables the Belgian energy market to innovate and move forward. Maarten took up the role as project manager and solution architect.
O'Reilly Media Webcast: Building Real-Time Data PipelinesSingleStore
As our customers tap into new sources of data or modify to existing data pipelines, we are often asked questions like: What technologies should we consider? Where can we reduce data latency? How can we simplify our data architecture?
To eliminate the guesswork, we teamed up with Ben Lorica, Chief Data Scientist at O’Reilly Media to host a webcast centered around building real-time data pipelines.
Machines and the Magic of Fast LearningSingleStore
Human-machine interaction is no longer the exclusive province of science fiction. The advance of the internet and connected devices has inspired data scientists to create machine-learning applications to extract value from these new forms of data.
So what's the next frontier?
Join MemSQL Engineer Michael Andrews and Sr. Director Mike Boyarski to learn how to use real-time data as a vehicle for operationalizing machine-learning models. Michael and Mike will explore advanced tools, including TensorFlow, Apache Spark, and Apache Kafka, and compelling use cases demonstrating the power of machine learning to effect positive change.
You will learn:
Top technologies for building the ideal machine-learning stack
How to power machine-learning applications with real-time data
A use case and demo of machine learning for social good
Building the Ideal Stack for Machine LearningSingleStore
Machine Learning is not new, but its application across memory-optimized distributed systems has led to an explosion in both the number and capability of its uses. Pandora develops personalized content recommendations with machine learning algorithms, Tesla has produced the first widely distributed autonomous vehicle, and Amazon uses autonomous robots to move packages within its warehouses and even deliver packages. When coupled with real-time data, advanced analytics approaches like machine learning and deep learning create immediate business opportunities.
Machine learning has never been more accessible—if your data pipelines support real-time analysis. Attendees will learn tools and techniques for integrating machine learning models across industries and organizations. Steven Camiña, MemSQL Product Manager, will walk through critical technologies needed in your technology ecosystem, including Python, Apache Kafka, Apache Spark, and a real-time database.
CTO View: Driving the On-Demand Economy with Predictive AnalyticsSingleStore
In the on-demand economy real-time analytics is both a necessity and a competitive advantage. The next evolution in the on-demand economy is in predictive analytics fueled by live streams of data—in effect knowing what customers want before they do. This session will feature technical examples of real-time pipelines, machine learning, and custom dashboards as well as off-the-shelf dashboards with Tableau.
Operationalizing Machine Learning at Scale at StarbucksDatabricks
As ML-driven innovations are propelled by the Self-Service capabilities in the Enterprise Data and Analytics Platform, teams face a significant entry barrier and productivity issues in moving from POCs to Operating ML-powered apps at scale in production.
Winning the On-Demand Economy with Spark and Predictive AnalyticsSingleStore
Today’s on-demand economy drives companies to provide fast load times, personalization, and instantaneous service for hungry end-users across all types of applications. Yet most still use dated, legacy systems to process and analyze data. In this session, Ankur Goyal, VP of Engineering at MemSQL will showcase implementing a one-click Lambda Architecture with Apache Spark, Apache Kafka and an operational database, resulting in lightning fast analytics on large, changing datasets.
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingSingleStore
Robin Li, Director of Data Engineering and Yohan Chin, VP Data Science at Tapjoy share how to architect the best application experience for mobile users using technologies including Apache Kafka, Apache Spark, and MemSQL.
Speaker: Robin Li - Director of Data Engineering, Tapjoy and Yohan Chin - VP Data Science, Tapjoy
Building the Next-gen Digital Meter Platform for FluviusDatabricks
Fluvius is the network operator for electricity and gas in Flanders, Belgium. Their goal is to modernize the way people look at energy consumption using a digital meter that captures consumption and injection data from any electrical installation in Flanders ranging from households to large companies. After full roll-out there will be roughly 7 million digital meters active in Flanders collecting up to terabytes of data per day. Combine this with regulation that Fluvius has to maintain a record of these reading for at least 3 years, we are talking petabyte scale. delaware BeLux was assigned by Fluvius to setup a modern data platform and did so on Azure using Databricks as the core component to collect, store, process and serve these volumes of data to every single consumer and beyond in Flanders. This enables the Belgian energy market to innovate and move forward. Maarten took up the role as project manager and solution architect.
O'Reilly Media Webcast: Building Real-Time Data PipelinesSingleStore
As our customers tap into new sources of data or modify to existing data pipelines, we are often asked questions like: What technologies should we consider? Where can we reduce data latency? How can we simplify our data architecture?
To eliminate the guesswork, we teamed up with Ben Lorica, Chief Data Scientist at O’Reilly Media to host a webcast centered around building real-time data pipelines.
If your heart cracks when you hear terrible stories about preschoolers having chemo when they should be having nothing but fun; if you think the teams of doctors and scientists working tirelessly to end childhood disease deserve the means to continue research and healing....
Digital Business Transformation in the Streaming EraAttunity
Enterprises are rapidly adopting stream computing backbones, in-memory data stores, change data capture, and other low-latency approaches for end-to-end applications. As businesses modernize their data architectures over the next several years, they will begin to evolve toward all-streaming architectures. In this webcast, Wikibon, Attunity, and MemSQL will discuss how enterprise data professionals should migrate their legacy architectures in this direction. They will provide guidance for migrating data lakes, data warehouses, data governance, and transactional databases to support all-streaming architectures for complex cloud and edge applications. They will discuss how this new architecture will drive enterprise strategies for operationalizing artificial intelligence, mobile computing, the Internet of Things, and cloud-native microservices.
Link to the Wikibon report - wikibon.com/wikibons-2018-big-data-analytics-trends-forecast
Link to Attunity Streaming CDC Book Download - http://www.bit.ly/cdcbook
Link to MemSQL's Free Data Pipeline Book - http://go.memsql.com/oreilly-data-pipelines
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Amazon Web Services
"This presentation will introduce Kinesis, the new AWS service for real-time streaming big data ingestion and processing.
We’ll provide an overview of the key scenarios and business use cases suitable for real-time processing, and discuss how AWS designed Amazon Kinesis to help customers shift from a traditional batch-oriented processing of data to a continual real-time processing model. We’ll provide an overview of the key concepts, attributes, APIs and features of the service, and discuss building a Kinesis-enabled application for real-time processing. We’ll also contrast with other approaches for streaming data ingestion and processing. Finally, we’ll also discuss how Kinesis fits as part of a larger big data infrastructure on AWS, including S3, DynamoDB, EMR, and Redshift."
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Amazon Web Services
Originally, Hadoop was used as a batch analytics tool; however, this is rapidly changing, as applications move towards real-time processing and streaming. Amazon Elastic MapReduce has made running Hadoop in the cloud easier and more accessible than ever. Each day, tens of thousands of Hadoop clusters are run on the Amazon Elastic MapReduce infrastructure by users of every size — from university students to Fortune 50 companies. We recently launched Amazon Kinesis – a managed service for real-time processing of high volume, streaming data. Amazon Kinesis enables a new class of big data applications which can continuously analyze data at any volume and throughput, in real-time. Adi will discuss each service, dive into how customers are adopting the services for different use cases, and share emerging best practices. Learn how you can architect Amazon Kinesis and Amazon Elastic MapReduce together to create a highly scalable real-time analytics solution which can ingest and process terabytes of data per hour from hundreds of thousands of different concurrent sources. Forever change how you process web site click-streams, marketing and financial transactions, social media feeds, logs and metering data, and location-tracking events.
Breakout session during Proact's SYNC 2013, 18 september 2013.
Software Defined StorageClustered Data ONTAP: The Storage Hypervisor by Wessel Gans, NetApp
Data Con LA 2022 - Making real-time analytics a reality for digital transform...Data Con LA
Fadi Azhari, VP of Marketing, StarRocks
- Enterprises are facing an imperative to grow their business to gain competitive advantage at breakneck speed. They need to achieve that by adding new value services efficiently and effectively.
- To achieve growth from these new services, enterprises need new insights instantly from their constantly changing data.
- Unfortunately, current data infrastructure solutions offer sub-optimal solutions that leave customers wrestling with to achieve their business goals.
Why is real-time analytics so difficult?
- Data freshness and fast responsiveness are both important and present technical challenges of their own.
- User-facing analytics and operational analytics mean supporting thousands of users simultaneously.
- You have to do a lot of unnecessary de-normalized tables (de-normalization jobs) in streaming pipelines. It is very difficult to build and maintain.
- You can't easily update the data in realtime to analyze business changes.
StarRocks re-invents real-time analytics with the only platform uniquely designed for the next generation real-time Enterprise, unleashing the power of business intelligence to help organizations accelerate their digital transformation. StarRocks makes real-time analytics a reality with the fastest, easy-to-use analytics platform on the planet.
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
Tackling the challenge of designing a machine learning model and putting it into production is the key to getting value back – and the roadblock that stops many promising machine learning projects. After the data scientists have done their part, engineering robust production data pipelines has its own set of challenges. Syncsort software helps the data engineer every step of the way.
Building on the process of finding and matching duplicates to resolve entities, the next step is to set up a continuous streaming flow of data from data sources so that as the sources change, new data automatically gets pushed through the same transformation and cleansing data flow – into the arms of machine learning models.
Some of your sources may already be streaming, but the rest are sitting in transactional databases that change hundreds or thousands of times a day. The challenge is that you can’t affect performance of data sources that run key applications, so putting something like database triggers in place is not the best idea. Using Apache Kafka or similar technologies as the backbone to moving data around doesn’t solve the problem of needing to grab changes from the source pushing them into Kafka and consuming the data from Kafka to be processed. If something unexpected happens – like connectivity is lost on either the source or the target side, you don’t want to have to fix it or start over because the data is out of sync.
View this 15-minute webcast on-demand to learn how to tackle these challenges in large scale production implementations.
By 2020, 50% of all new software will process machine-generated data of some sort (Gartner). Historically, machine data use cases have required non-SQL data stores like Splunk, Elasticsearch, or InfluxDB.
Today, new SQL DB architectures rival the non-SQL solutions in ease of use, scalability, cost, and performance. Please join this webinar for a detailed comparison of machine data management approaches.
Risc and velostrata 2 28 2018 lessons_in_cloud_migrationRISC Networks
Learn how to accelerate and
de-risk your cloud migration project
Despite the surge in enterprises migrating applications to the public cloud, more than half of all projects are delayed or over budget and an even greater number are more difficult than expected.1
Cloud Migrations don’t begin when you start moving applications into the cloud. They begin with your application landscape discovery and assessment. The second phase comprises the actual migration where applications are moved to the public cloud. Working with purpose-built enterprise-grade cloud migration platforms, especially those that partner to integrate both phases greatly simplifies and accelerates projects.
RISC Networks and Velostrata have teamed up to deliver this webinar where we’ll share real-world examples, tips, and tricks on crafting a seamless cloud migration from start to completion.
Modernizing your Application Architecture with Microservicesconfluent
Organizations are quickly adopting microservice architectures to achieve better customer service and improve user experience while limiting downtime and data loss. However, transitioning from a monolithic architecture based on stateful databases to truly stateless microservices can be challenging and requires the right set of solutions.
In this webinar, learn from field experts as they discuss how to convert the data locked in traditional databases into event streams using HVR and Apache Kafka®. They will show you how to implement these solutions through a real-world demo use case of microservice adoption.
You will learn:
-How log-based change data capture (CDC) converts database tables into event streams
-How Kafka serves as the central nervous system for microservices
-How the transition to microservices can be realized without throwing away your legacy infrastructure
Bring Your Mission-Critical Data to Your Cloud Apps and Analytics Precisely
To stay competitive, you need to swiftly deliver innovative web and mobile apps and analytics solutions that include all your critical data—including mainframe and IBM i. Join us to hear how forward-thinking companies are using modern cloud-based platforms to deliver solutions that drive better customer experiences and greater insight—all while extending the value of their core systems.
Accelerate Innovation by Bringing all Your Mission-Critical Data to Your Clou...Precisely
To stay competitive, you need to swiftly deliver innovative web and mobile apps and analytics solutions that include all your critical data—including mainframe and IBM i. Join us to hear how forward-thinking companies are using modern cloud-based platforms to deliver solutions that drive better customer experiences and greater insight—all while extending the value of their core systems.
Denodo DataFest 2017: Outpace Your Competition with Real-Time ResponsesDenodo
Watch the presentation on-demand now: https://goo.gl/kceFTe
Today’s digital economy demands a new way of running business. Flexible access to information and responses in real time are essential for outpacing competition.
Watch this Denodo DataFest 2017 session to discover:
• Data access challenges faced by organizations today.
• How data virtualization facilitates real-time analytics.
• Key use cases and customer success stories.
The Need For Speed - Strategies to Modernize Your Data CenterEDB
Join Postgres expert, Marc Linster and Nutanix Product Manager, Jeremy Launier as they share strategies for creating agility in the enterprise, explain how to avoid the complexity and cost of legacy IT, and discuss the benefits leveraging the cloud.
Highlights include:
- How to increase database flexibility and why it matters
- How to leverage the private cloud effectively
- How to maximize the benefit of on premises DBaaS (Database as a Service)
This webinar is a joint session between EnterpriseDB and Nutanix, two companies recognized in the Gartner Magic Quadrant for operational database management systems and hyperconverged infrastructure.
Similar to Driving the On-Demand Economy with Predictive Analytics (20)
The database market is large and filled with many solutions. In this talk, Seth Luersen from MemSQL we will take a look at what is happening within AWS, the overall data landscape, and how customers can benefit from using MemSQL within the AWS ecosystem.
Converging Database Transactions and Analytics SingleStore
delivered at the Gartner Data and Analytics 2018 show in Texas. This presentation discusses real-time applications and their impact on existing data infrastructures
MemSQL 201: Advanced Tips and Tricks WebcastSingleStore
Topics discussed include differences between columnstore and rowstore engines, data ingestion, data sharding and query tuning, lastly memory and workload management.
Watch the replay at https://memsql.wistia.com/medias/4siccvlorm
An Engineering Approach to Database EvaluationsSingleStore
This talk will go over a methodical approach for making a decision, dig into interesting tradeoffs, and give tips about what things to look for under the hood and how to evaluate the tech behind the database.
Building a Fault Tolerant Distributed ArchitectureSingleStore
This talk will highlight some of the challenges to building a fault tolerant distributed architecture, and how MemSQL's architecture tackles these challenges.
Stream Processing with Pipelines and Stored ProceduresSingleStore
This talk will discuss an upcoming feature in MemSQL 6.5 showing how advanced stream processing use cases can be tackled with a combination of stored procedures (new in 6.0) and MemSQL's pipelines feature.
Learn how to leverage MPP technology and distributed data to deliver high volume transactional and analytical work loads which result in real time dashboards on rapidly changing data using standard SQL tools. Demonstrations will include the streaming of structured and JSON data from Kafka messages through a micro-batch ETL process into the MemSQL database where the data is then queried using standard SQL tools and visualized leveraging Tableau.
This session will focus on image recognition, the techniques available, and how to put those techniques into production. It will further explore algebraic operations on tensors, and how that can assist in large-scale, high-throughput, highly-parallel image recognition.
LIVE DEMO: Constructing and executing a real-time image recognition pipeline using Kafka and Spark.
Speaker: Neil Dahlke, MemSQL Senior Solutions Engineer
How Database Convergence Impacts the Coming Decades of Data ManagementSingleStore
How Database Convergence Impacts the Coming Decades of Data Management by Nikita Shamgunov, CEO and co-founder of MemSQL.
Presented at NYC Database Month in October 2017. NYC Database Month is the largest database meetup in New York, featuring talks from leaders in the technology space. You can learn more at http://www.databasemonth.com.
James Burkhart explains how Uber supports millions of analytical queries daily across real-time data with Apollo. James covers the architectural decisions and lessons learned building an exactly-once ingest pipeline storing raw events across in-memory row storage and on-disk columnar storage and a custom metalanguage and query layer leveraging partial OLAP result set caching and query canonicalization. Putting all the pieces together provides thousands of Uber employees with subsecond p95 latency analytical queries spanning hundreds of millions of recent events.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
19. Real-Time Scalable Multi-Cloud
• Fast Ingest, Low Latency
Queries, High Concurrency
• Operational and Ad-Hoc
Analytics
• Powerful applications
• Petabyte scale
• Elastic
• Commodity hardware
• Low TCO
• Cloud Managed Service
• On-premises
• Multi-cloud deployments
• Enterprise-grade Security
A Modern Data Warehouse for the Enterprise Frontlines
20. Real-Time Data
Messaging and
Transforms
Historical Data
Bare Metal, Virtual Machines, Containers On-Premises, Cloud, As a Service
Business
Intelligence
Dashboards
Real-Time
Application
AnalyticsReal-Time Data
Warehouse
Live Data
Memory Optimized
Tables
Historical Data
Disk Optimized
Tables
Streaming Ingest
Real-Time
Data Pipelines
26. ▪ Every piece of technology is scalable
▪ Analyzing data from hundreds of thousands of
machines
▪ Delivering immense value in real-time
• Real-time code deployment
• Detecting anomalies
• A/B testing results
▪ Fundamentally making the business faster by providing
data at your fingertips
An Insider’s View at Facebook
28. MemSQL Confidential28
Product or Services Scores
for Operational Data
Warehouse
Critical Capabilities for Data
Warehouse and Data
Management Solutions for
Analytics
Gartner, July 2016
31. Gartner Hybrid DBMS Cloud Scenarios
31
Dev Test
Disaster
recovery
Architecture Spanning Use Case Specific Multicloud
Prod
Source: Gartner (April 2016)
32. Fully managed database service
for real-time analytics
Fast data ingest from Amazon S3,
Kinesis, or Apache Kafka
Lift and Shift to the Cloud
REAL-TIME ANALYTICS
AT YOUR SERVICE
7
33. A Comprehensive Hybrid Cloud Approach
▪On-Premises deployments
•Any hardware, NUMA optimizations, vectorized processing
•Any Linux, VMs, Containers
•Robust security including Role-based Access Control (RBAC)
•Perfect for retiring legacy appliances
▪Any Cloud
•Any cloud service IaaS
•Simple `replicate database` command
34. Strategic Approach to Global Analytics
▪ Develop a center of excellence for real-time
infrastructure
• Include both real-time AND historical data
▪ Identify complements and replacements for traditional
technologies
• Data warehouse appliances, rigid systems
▪ Choose a straightforward path
• Faster historical insights, real-time data, predictive analytics
36. Real-time analytics transformed profitability analysis of customer logistics
from weekly to daily, and reduced latency from days to minutes
+
36
37. 37
BUSINESS BENEFITS
▪ New increased profitability customer logistics analysis
from weekly to daily
▪ Reduce Latency from days to minutes
TECHNICAL BENEFITS
▪ Reduced 22 hour ETL to minutes
▪ Increased query response time by 80x
39. Reducing delay in “freshness of data” from two hours to 10 minutes
+
40. BUSINESS BENEFITS
▪ Reach over 1.5 billion users across 100s of thousands of mobile apps
▪ Real-time optimization of advertising campaigns with improved post-install
engagements
▪ Net new personalized campaign performance reports for greater visibility
TECHNICAL BENEFITS
▪ 10x faster data refresh, from hours to minutes
▪ Ad-hoc queries on raw log-level data in seconds
▪ Real-time deduplication
▪ 1 TB/day ingest
41. THE MANAGE REAL-TIME ARCHITECTURE
41
REAL-TIME
ANALYTICSReal-Time
inputs
http://www.enterprisetech.com/2016/12/09/managing-30b-bid-requests/
45. SECURE DATA LAKE ARCHITECTURE
45 http://itblog.emc.com/2016/05/03/simplifying-lives-emc-myservice360/
46. 46
The global leader in Content Delivery Network services, has deployed MemSQL
to enhance billing speed and efficiency for thousands of customers around the world
+
47. 47
BUSINESS BENEFITS
▪ Analyze Historical Against Live Data
▪ 30% Increased Forecast Accuracy
▪ Tighter Financial Reporting to Street
TECHNICAL BENEFITS
▪ Analyze Millions of rows / second
▪ Massive Concurrency
▪ 8 Million UPSERTS per second
▪ Process 1PB Data Per Day
48. THE AKAMAI REAL-TIME ARCHITECTURE
48
REAL-TIME
ANALYTICSCustomer
traffic
statistics