Real-Time Analytics with MemSQL and Spark

•Download as PPTX, PDF•

3 likes•1,021 views

Learn how Pinterest measures real-time user engagement in this technical demonstration that leverages Spark to enrich streaming data with geolocation.

Technology

Neil Dahlke, Engineer
2016 November 4
Real-Time Analytics with
MemSQL and Spark

About Me: Neil Dahlke
 Engineer
 MemSQL
• real-time database for transactions / analytics
 Formerly Globus
• high performance data transfer for research scientists
 Past talks
• Real-time, Geospatial, Maps
 Slides: http://www.slideshare.net/MemSQL/realtime-geospatial-maps-
by-neil-dahlke

WHAT WE
ARE SEEING
A WORLD OF CONNECTED
MACHINES AND PEOPLE

WHAT WE ARE SEEING:
Sensors. Applications. Machines. And us.
Generating more data every single day.
By 2020, over 20 billion connected things will
be in use across a range of industries.

REAL-TIME
INPUTS
LIVE
OUTPUTS
Sensors
Logs
Events
Streaming
Inserts
Upserts
Queries
Dashboards
Business
Intelligence
Applications
Predict Analytics

WHAT DO REAL TIME BUSINESSES NEED?
FAST DATA
INGEST
The volume of data
that can be ingested
into the database

WHAT DO REAL TIME BUSINESSES NEED?
LOW LATENCY
QUERIES
The time it takes to
execute queries and
receive results

WHAT DO REAL TIME BUSINESSES NEED?
HIGH
CONCURRENCY
The ability to scale
simultaneous operations

WHAT DO REAL TIME BUSINESSES NEED?
FAST DATA
INGEST
The volume of data
that can be ingested
into the database
LOW LATENCY
QUERIES
The time it takes to
execute queries and
receive results
HIGH
CONCURRENCY
The ability to scale
simultaneous operations

A massively scalable database and ingest solution allowed for
massive growth, real-time analytic applications and faster, targeted.
+

 Kafka
• Component we kept
 S3
• Persisted all logs to cold storage for eventual analysis
 Hadoop
• Nighly map-reduce jobs
 Redshift
• Took a full day to load data from previous day
• Reaching overlap of times caused data crisis
Before

 No real time access to analytics
 No SQL interface for analysts and data scientists
 Massive nightly Hadoop batch jobs (late data)
 Unfiltered and incomplete data (silos)
 Expensive
Why was this bad for their business operations?

Why was this bad for their data operations?
 Too slow
 Not scalable
 No deduplication
• aka not exactly-once
 Low concurrency
FAST DATA
INGEST
LOW
LATENCY
QUERIES
HIGH
CONCURRENCY

TECHNICAL BENEFITS
 Instant accuracy to the latest re-pin
 1 GB/sec totaling 72 TB/day
THE PINTEREST REAL-TIME ARCHITECTURE
REAL-TIME
ANALYTICS

Accelerated ingest
time by 200,000x
1 GB/sec totaling
72 TB/day
RESULTS

Visualizing the Data
 Demo built using
• Mapbox
• Websockets
• Tornado web server
 When an image is re pinned, the circles on the globe
expand, showing higher volume areas
 Reads data from MemSQL directly
24

More Info
 http://www.odbms.org/blog/2015/04/powering-big-data-at-
pinterest-interview-with-krishna-gade/
 https://gigaom.com/2015/02/18/pinterest-is-
experimenting-with-memsql-for-real-time-data-analytics/
 https://www.infoq.com/news/2015/03/pinterest-memsql-
spark-streaming
 http://blog.memsql.com/pinterest-apache-spark-use-case/
 https://engineering.pinterest.com/blog/real-time-analytics-
pinterest

Resources
 https://github.com/memsql/memsql-spark-connector
 http://docs.memsql.com/docs/streamliner-administration
 http://docs.memsql.com/docs/pipelines-overview
 https://github.com/memsql/memsql-docker-quickstart

What's hot

Enabling Real-Time Analytics for IoT

SingleStore

Driving the On-Demand Economy with Spark and Predictive Analytics

SingleStore

Today’s on-demand economy drives companies to provide fast load times, personalization, and instantaneous service for hungry end-users across all types of applications. Yet most still use dated, legacy systems to process and analyze data. In this session, Ankur Goyal, VP of Engineering at MemSQL will showcase implementing a one-click Lambda Architecture with Apache Spark, Apache Kafka and an operational database, resulting in lightning fast analytics on large, changing datasets.

Winning the On-Demand Economy with Spark and Predictive Analytics

SingleStore

Real-Time Geospatial Intelligence at Scale

SingleStore

Building the Ideal Stack for Real-Time Analytics

SingleStore

In-Memory Computing Webcast. Market Predictions 2017

SingleStore

Getting It Right Exactly Once: Principles for Streaming Architectures

SingleStore

Human-machine interaction is no longer the exclusive province of science fiction. The advance of the internet and connected devices has inspired data scientists to create machine-learning applications to extract value from these new forms of data. So what's the next frontier? Join MemSQL Engineer Michael Andrews and Sr. Director Mike Boyarski to learn how to use real-time data as a vehicle for operationalizing machine-learning models. Michael and Mike will explore advanced tools, including TensorFlow, Apache Spark, and Apache Kafka, and compelling use cases demonstrating the power of machine learning to effect positive change. You will learn: Top technologies for building the ideal machine-learning stack How to power machine-learning applications with real-time data A use case and demo of machine learning for social good

Machines and the Magic of Fast Learning

SingleStore

Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising

SingleStore

Real-Time, Geospatial, Maps by Neil Dahlke

SingleStore

Real-Time Analytics with Confluent and MemSQL

SingleStore

Machine Learning is not new, but its application across memory-optimized distributed systems has led to an explosion in both the number and capability of its uses. Pandora develops personalized content recommendations with machine learning algorithms, Tesla has produced the first widely distributed autonomous vehicle, and Amazon uses autonomous robots to move packages within its warehouses and even deliver packages. When coupled with real-time data, advanced analytics approaches like machine learning and deep learning create immediate business opportunities. Machine learning has never been more accessible—if your data pipelines support real-time analysis. Attendees will learn tools and techniques for integrating machine learning models across industries and organizations. Steven Camiña, MemSQL Product Manager, will walk through critical technologies needed in your technology ecosystem, including Python, Apache Kafka, Apache Spark, and a real-time database.

Building the Ideal Stack for Machine Learning

SingleStore

Modeling the Smart and Connected City of the Future with Kafka and Spark

SingleStore

Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark

SingleStore

See who is using MemSQL

jenjermain

Building Software to Scale

SingleStore

As our customers tap into new sources of data or modify to existing data pipelines, we are often asked questions like: What technologies should we consider? Where can we reduce data latency? How can we simplify our data architecture? To eliminate the guesswork, we teamed up with Ben Lorica, Chief Data Scientist at O’Reilly Media to host a webcast centered around building real-time data pipelines.

O'Reilly Media Webcast: Building Real-Time Data Pipelines

SingleStore

Internet of Things and Multi-model Data Infrastructure

SingleStore

Fluvius is the network operator for electricity and gas in Flanders, Belgium. Their goal is to modernize the way people look at energy consumption using a digital meter that captures consumption and injection data from any electrical installation in Flanders ranging from households to large companies. After full roll-out there will be roughly 7 million digital meters active in Flanders collecting up to terabytes of data per day. Combine this with regulation that Fluvius has to maintain a record of these reading for at least 3 years, we are talking petabyte scale. delaware BeLux was assigned by Fluvius to setup a modern data platform and did so on Azure using Databricks as the core component to collect, store, process and serve these volumes of data to every single consumer and beyond in Flanders. This enables the Belgian energy market to innovate and move forward. Maarten took up the role as project manager and solution architect.

Building the Next-gen Digital Meter Platform for Fluvius

Databricks

Journey to the Real-Time Analytics in Extreme Growth

SingleStore

What's hot (20)

Enabling Real-Time Analytics for IoT

Driving the On-Demand Economy with Spark and Predictive Analytics

Winning the On-Demand Economy with Spark and Predictive Analytics

Real-Time Geospatial Intelligence at Scale

Building the Ideal Stack for Real-Time Analytics

In-Memory Computing Webcast. Market Predictions 2017

Getting It Right Exactly Once: Principles for Streaming Architectures

Machines and the Magic of Fast Learning

Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising

Real-Time, Geospatial, Maps by Neil Dahlke

Real-Time Analytics with Confluent and MemSQL

Building the Ideal Stack for Machine Learning

Modeling the Smart and Connected City of the Future with Kafka and Spark

Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark

See who is using MemSQL

Building Software to Scale

O'Reilly Media Webcast: Building Real-Time Data Pipelines

Internet of Things and Multi-model Data Infrastructure

Building the Next-gen Digital Meter Platform for Fluvius

Journey to the Real-Time Analytics in Extreme Growth

Viewers also liked

Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL

SingleStore

Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark

SingleStore

MemSQL - The Real-time Analytics Platform

SingleStore

Those large binders filled with perforated pages striped in green and white may be long gone, but the batch job – paper report – human decision pattern remains engrained on our managerial consciousness, for better or worse. Fortunately, this perception is changing. Real-time analytics technology is now a reality. However, even real-time analytics ends with a human decision, not with an automated decision leading to an action that changes the data in real-time. Once we establish a fully automated feedback loop, then we take this final remaining bottleneck out of the equation, unleashing a new level of speed and performance. Taking out this bottleneck is absolutely essential as we move from real-time analytics to streaming analytics. Rising to this challenge is cognitive computing – a way of analyzing streaming data that are multistructured, ambiguous, and in a constant state of flux. The advantages are profound. Fraud detection and prevention, dynamic product pricing, Internet-of-Things (IoT) data analysis, electronic trading, customer promotion triggering, and compliance monitoring are some of the early examples of the power of streaming analytics – bolstered by cognitive computing to establish real-time, machine learning-based feedback loops that drive business value with no bottlenecks. In order to play at this new level, however, we must learn new skills. Just as we’re struggling to move from batch job/report/decision thinking to real-time thinking, we must now take the next step: working with never-ending torrents of multistructured, dynamic data.

Streaming Analytics and Cognitive Computing - Changing the Game

Jason Bloomberg

As the world moves from batch to online data processing, real-time data pipelines will supercede siloed data warehouse and transaction processing systems as core infrastructure. While many analytics solutions tout query execution speed, this is only half of the equation. For real time workloads, stale data renders query speed irrelevant when results and insights are out of date. Beyond just “online queries,” real-time enterprises need “online datasets” that continuously update and make data accessible across the organization. This session will cover approaches to building real-time pipelines with MemSQL, Hadoop, and Spark. Topics will include: Key industry trends and the move to real-time data pipelines How MemSQL customer Novus built the premier financial portfolio management platform using MemSQL as a real-time data store and query engine. Operationalizing Spark for Advanced Analytics Demonstration of how Pinterest is using the MemSQL Spark Connector to derive real-time insights on interesting and meaningful user activity with MemSQL and Spark. Introduction to the MemSQL Spark Connector Strategies for integrating Spark and Hadoop with real-time systems for transaction processing and operational analytics. Presenters include MemSQL CEO Eric Frenkiel, Novus CTO Robert Stepeck, and Pinterest Software Engineer Yu Yang. In a world of web portals and push notifications, users have developed demanding expectations for a real-time experience. Continuous updates, a responsive interface, and short loading times have become the norm. Most business analysts and data scientists, whose workflows remain bound by legacy tools and complex data pipelines, lack this fast, simple user experience. From a business perspective, latency and complexity impede revenue by preventing access to the right data at the right time. Businesses that recognize the value of access to real-time data now have options to meet stringent objectives. They understand that serving “always up to date” data for analysis requires converging transactions and analytics in a real-time system. This session will highlight these architectures and customer achievements.

Bringing olap fully online analyze changing datasets in mem sql and spark wi...

SingleStore

Introducing MemSQL 4

SingleStore

MemSQL DB Class, Ankur Goyal

SingleStore

In-Memory Database System Built for Speed and Scale

SingleStore

Building a Real-Time Data Pipeline with Spark, Kafka, and Python

SingleStore

For 30 years the central fact of database performance was the gigantic difference in the time it takes to access a random piece of data in RAM versus on a hard drive. It’s now feasible to skip all that heartache by placing your data entirely in RAM. It’s not as simple as that, of course. You can’t just take a btree, mmap it, and call it a day. There are a lot of implications to a truly memory-native design that have yet to be unwound. These two trends are producing an entirely new way to think about, design, and build applications. So let’s talk about how we got here, how we’re doing, and hints about where the future will take us.

The Road To RAM - Carlos Bueno, MemSQL

SingleStore

INTRODUCING: CREATE PIPELINE

SingleStore

MemSQL

Ramzi Alqrainy

Boosting Machine Learning with Redis Modules and Spark

Dvir Volk

In-Memory Database Performance on AWS M4 Instances

SingleStore

Viewers also liked (14)

Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL

Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark

MemSQL - The Real-time Analytics Platform

Streaming Analytics and Cognitive Computing - Changing the Game

Bringing olap fully online analyze changing datasets in mem sql and spark wi...

Introducing MemSQL 4

MemSQL DB Class, Ankur Goyal

In-Memory Database System Built for Speed and Scale

Building a Real-Time Data Pipeline with Spark, Kafka, and Python

The Road To RAM - Carlos Bueno, MemSQL

INTRODUCING: CREATE PIPELINE

MemSQL

Boosting Machine Learning with Redis Modules and Spark

In-Memory Database Performance on AWS M4 Instances

Similar to Real-Time Analytics with MemSQL and Spark

Big data – can it deliver speed and accuracy v1

GurinderG

Expert IT analyst groups like Wikibon forecast that NoSQL database usage will grow at a compound rate of 60% each year for the next five years, and Gartner Groups says NoSQL databases are one of the top trends impacting information management in 2013. But is NoSQL right for your business? How do you know which business applications will benefit from NoSQL and which won't? What questions do you need to ask in order to make such decisions? If you're wondering what NoSQL is and if your business can benefit from NoSQL technology, join DataStax for the Webinar, "How to Tell if Your Business Needs NoSQL". This to-the-point presentation will provide practical litmus tests to help you understand whether NoSQL is right for your use case, and supplies examples of NoSQL technology in action with leading businesses that demonstrate how and where NoSQL databases can have the greatest impact." Speaker: Robin Schumacher, Vice President of Products at DataStax Robin Schumacher has spent the last 20 years working with databases and big data. He comes to DataStax from EnterpriseDB, where he built and led a market-driven product management group. Previously, Robin started and led the product management team at MySQL for three years before they were bought by Sun (the largest open source acquisition in history), and then by Oracle. He also started and led the product management team at Embarcadero Technologies, which was the #1 IPO in 2000. Robin is the author of three database performance books and frequent speaker at industry events. Robin holds BS, MA, and Ph.D. degrees from various universities.

How To Tell if Your Business Needs NoSQL

DataStax

Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York

Hentsū

Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014

Dataiku

On 2020-12-09 Laurens Vijnck and Jonny Daenen gave a workshop at PXL. During this session, we collectively provisioned a streaming ingestion pipeline in mere minutes. The technology stack included Pub/Sub, Dataflow, and BigQuery. Hereafter, students had the opportunity to perform interactive queries on their own real-time data to answer a series of business questions. These questions were borrowed from real-life cases that we encountered at Selligent Marketing Cloud. Google Colab (Free Jupyter Notebooks) and Google Data Studio have proven to be excellent tools to facilitate these kinds of interactive sessions.

PXL Data Engineering Workshop By Selligent

Jonny Daenen

NoSQL databases like MongoDB, Elasticsearch, and Cassandra are synonymous with scalability, search, and developer agility. But there’s a downside...having to give up the ease and comfort of SQL. Or do you? Join this webcast to learn how the newest databases, like CrateDB and CockroachDB deliver the benefits of NoSQL with the ease of SQL by building SQL engines on top of custom NoSQL technology stacks. Database industry veteran Andy Ellicott, who helped launch Vertica, VoltDB, Cloudant, and now with Crate.io, will provide a no-BS view of current DBMS architectures and predictions for the future of data. If you’re a DBMS user, this webcast will help you make sense of a very crowded DBMS market and make better-informed decisions for your new tech stacks.

Webinar: The Future of SQL

Crate.io

Big Data Analytics Strategy and Roadmap

Srinath Perera

Keepin’ It Real(-Time) With Nadine Farah | Current 2022 Let’s get real— companies are incorporating more streaming sources as part of their data stack to unlock their customers’ and business needs and trends in real-time. While many data engineers ingest streaming data into their data warehouses or data lakes, they are not unlocking the full potential of the data. In order to extract the most value from your streaming data, you’ll need to consider: - data freshness - query latency - storage - concurrency - data mutability - analyzing streaming data in context (i.e. JOINing) with data from other data sources In this tech talk, we’ll cover these aforementioned considerations in detail. We’ll show you how to build a SQL-based, real-time recommendation engine and customer 360 data application using Kafka, Rockset, and Retool. By the end, you’ll be equipped to effectively evaluate databases and tools to meet your real-time needs with streaming data.

Keepin’ It Real(-Time) With Nadine Farah | Current 2022

HostedbyConfluent

Colorado Springs Open Source Hadoop/MySQL

David Smelker

By 2020, 50% of all new software will process machine-generated data of some sort (Gartner). Historically, machine data use cases have required non-SQL data stores like Splunk, Elasticsearch, or InfluxDB. Today, new SQL DB architectures rival the non-SQL solutions in ease of use, scalability, cost, and performance. Please join this webinar for a detailed comparison of machine data management approaches.

Webinar: SQL for Machine Data?

Crate.io

Gluent Extending Enterprise Applications with Hadoop

gluent.

20160331 sa introduction to big data pipelining berlin meetup 0.3

Simon Ambridge

Hadoop HDFS.ppt

6535ANURAGANURAG

Tackling the challenge of designing a machine learning model and putting it into production is the key to getting value back – and the roadblock that stops many promising machine learning projects. After the data scientists have done their part, engineering robust production data pipelines has its own set of challenges. Syncsort software helps the data engineer every step of the way. Building on the process of finding and matching duplicates to resolve entities, the next step is to set up a continuous streaming flow of data from data sources so that as the sources change, new data automatically gets pushed through the same transformation and cleansing data flow – into the arms of machine learning models. Some of your sources may already be streaming, but the rest are sitting in transactional databases that change hundreds or thousands of times a day. The challenge is that you can’t affect performance of data sources that run key applications, so putting something like database triggers in place is not the best idea. Using Apache Kafka or similar technologies as the backbone to moving data around doesn’t solve the problem of needing to grab changes from the source pushing them into Kafka and consuming the data from Kafka to be processed. If something unexpected happens – like connectivity is lost on either the source or the target side, you don’t want to have to fix it or start over because the data is out of sync. View this 15-minute webcast on-demand to learn how to tackle these challenges in large scale production implementations.

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...

Precisely

In this webinar, learn how SnapLogic and Amazon Web Services helped Earth Networks create a responsive, self-service cloud for data integration, preparation and analytics. We also discuss how Earth Networks gained faster data insights using SnapLogic’s Amazon Redshift data integration and other connectors to quickly integrate, transfer and analyze data from multiple applications. To learn more, visit: www.snaplogic.com/redshift

Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...

SnapLogic

Pacemaker hadoop infrastructure and soft serve experience

Vitaliy Bashun

"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...

Maya Lumbroso

"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...

Dataconomy Media

Unlock the value of your big data infrastructure

ManageEngine, Zoho Corporation

Big data doesn't mean big money. In fact, choosing a NoSQL solution will almost certainly save your business money, in terms of hardware, licensing, and total cost of ownership. What's more, choosing the correct technology for your use case will almost certainly increase your top line as well. Big words, right? We'll back them up with customer case studies and lots of details. This webinar will give you the basics for growing your business in a profitable way. What's the use of growing your top line but outspending any gains on cumbersome, ineffective, outdated IT? We'll take you through the specific use cases and business models that are the best fit for NoSQL solutions. By the way, no prior knowledge is required. If you don't even know what RDBMS or NoSQL stand for, you are in the right place. Get your questions answered, and get your business on the right track to meeting your customers' needs in today's data environment.

Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...

DataStax

Similar to Real-Time Analytics with MemSQL and Spark (20)

Big data – can it deliver speed and accuracy v1

How To Tell if Your Business Needs NoSQL

Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York

Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014

PXL Data Engineering Workshop By Selligent

Webinar: The Future of SQL

Big Data Analytics Strategy and Roadmap

Keepin’ It Real(-Time) With Nadine Farah | Current 2022

Colorado Springs Open Source Hadoop/MySQL

Webinar: SQL for Machine Data?

Gluent Extending Enterprise Applications with Hadoop

20160331 sa introduction to big data pipelining berlin meetup 0.3

Hadoop HDFS.ppt

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...

Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...

Pacemaker hadoop infrastructure and soft serve experience

"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...

Unlock the value of your big data infrastructure

Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...

More from SingleStore

Five ways database modernization simplifies your data life

SingleStore

How Kafka and Modern Databases Benefit Apps and Analytics

SingleStore

Architecting Data in the AWS Ecosystem

SingleStore

Building the Foundation for a Latency-Free Life

SingleStore

Converging Database Transactions and Analytics

SingleStore

Building a Machine Learning Recommendation Engine in SQL

SingleStore

MemSQL 201: Advanced Tips and Tricks Webcast

SingleStore

Introduction to MemSQL

SingleStore

An Engineering Approach to Database Evaluations

SingleStore

Building a Fault Tolerant Distributed Architecture

SingleStore

Stream Processing with Pipelines and Stored Procedures

SingleStore

Curriculum Associates Strata NYC 2017

SingleStore

Learn how to leverage MPP technology and distributed data to deliver high volume transactional and analytical work loads which result in real time dashboards on rapidly changing data using standard SQL tools. Demonstrations will include the streaming of structured and JSON data from Kafka messages through a micro-batch ETL process into the MemSQL database where the data is then queried using standard SQL tools and visualized leveraging Tableau. This session will focus on image recognition, the techniques available, and how to put those techniques into production. It will further explore algebraic operations on tensors, and how that can assist in large-scale, high-throughput, highly-parallel image recognition. LIVE DEMO: Constructing and executing a real-time image recognition pipeline using Kafka and Spark. Speaker: Neil Dahlke, MemSQL Senior Solutions Engineer

Image Recognition on Streaming Data

SingleStore

Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition

SingleStore

The State of the Data Warehouse in 2017 and Beyond

SingleStore

How Database Convergence Impacts the Coming Decades of Data Management

SingleStore

Teaching Databases to Learn in the World of AI

SingleStore

Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud

SingleStore

Gartner Catalyst 2017: Image Recognition on Streaming Data

SingleStore

James Burkhart explains how Uber supports millions of analytical queries daily across real-time data with Apollo. James covers the architectural decisions and lessons learned building an exactly-once ingest pipeline storing raw events across in-memory row storage and on-disk columnar storage and a custom metalanguage and query layer leveraging partial OLAP result set caching and query canonicalization. Putting all the pieces together provides thousands of Uber employees with subsecond p95 latency analytical queries spanning hundreds of millions of recent events.

Real-Time Analytics at Uber Scale

SingleStore

More from SingleStore (20)

Five ways database modernization simplifies your data life

How Kafka and Modern Databases Benefit Apps and Analytics

Architecting Data in the AWS Ecosystem

Building the Foundation for a Latency-Free Life

Converging Database Transactions and Analytics

Building a Machine Learning Recommendation Engine in SQL

MemSQL 201: Advanced Tips and Tricks Webcast

Introduction to MemSQL

An Engineering Approach to Database Evaluations

Building a Fault Tolerant Distributed Architecture

Stream Processing with Pipelines and Stored Procedures

Curriculum Associates Strata NYC 2017

Image Recognition on Streaming Data

Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition

The State of the Data Warehouse in 2017 and Beyond

How Database Convergence Impacts the Coming Decades of Data Management

Teaching Databases to Learn in the World of AI

Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud

Gartner Catalyst 2017: Image Recognition on Streaming Data

Real-Time Analytics at Uber Scale

Recently uploaded

GenAI Risks & Security Meetup 01052024.pdf

lior mazor

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

MadyBayot

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

In this session, we will delve into strategic approaches for optimizing knowledge management within Microsoft 365, amidst the evolving landscape of Copilot. From leveraging automatic metadata classification and permission governance with SharePoint Premium, to unlocking Viva Engage for the cultivation of knowledge and communities, you will gain actionable insights to bolster your organization's knowledge-sharing initiatives. In this session, we will also explore how to facilitate solutions to enable your employees to find answers and expertise within Microsoft 365. You will leave equipped with practical techniques and a deeper understanding of how there is more to effective knowledge management than just enabling Copilot, but building actual solutions to prepare the knowledge that Copilot and your employees can use.

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Drew Madelung

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

MS Copilot expands with MS Graph connectors

Nanddeep Nachan

A Beginners Guide to Building a RAG App Using Open Source Milvus

Zilliz

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

Modernizing Securities Finance: The cloud-native prime brokerage platform transforming capital markets. Madhu Subbu, Managing Director, Head of Securities Finance Engineering Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu

apidays

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Zilliz

Automating Google Workspace (GWS) & more with Apps Script

wesley chun

In the thrilling conclusion to 2023, ransomware groups had a banner year, really outdoing themselves in the "make everyone's life miserable" department. LockBit 3.0 took gold in the hacking olympics, followed by the plucky upstarts Clop and ALPHV/BlackCat. Apparently, 48% of organizations were feeling left out and decided to get in on the cyber attack action. Business services won the "most likely to get digitally mugged" award, with education and retail nipping at their heels. Hackers expanded their repertoire beyond boring old encryption to the much more exciting world of extortion. The US, UK and Canada took top honors in the "countries most likely to pay up" category. Bitcoins were the currency of choice for discerning hackers, because who doesn't love untraceable money?

Ransomware_Q4_2023. The report. [EN].pdf

Overkill Security

This presentations targets students or working professionals. You may know Google for search, YouTube, Android, Chrome, and Gmail, but did you know Google has many developer tools, platforms & APIs? This comprehensive yet still high-level overview outlines the most impactful tools for where to run your code, store & analyze your data. It will also inspire you as to what's possible. This talk is 50 minutes in length.

Powerful Google developer tools for immediate impact! (2023-24 C)

wesley chun

Recently uploaded (20)

GenAI Risks & Security Meetup 01052024.pdf

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

presentation ICT roal in 21st century education

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

How to Troubleshoot Apps for the Modern Connected Worker

Data Cloud, More than a CDP by Matt Robison

MS Copilot expands with MS Graph connectors

A Beginners Guide to Building a RAG App Using Open Source Milvus

Exploring the Future Potential of AI-Enabled Smartphone Processors

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu

Artificial Intelligence Chap.5 : Uncertainty

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Automating Google Workspace (GWS) & more with Apps Script

Ransomware_Q4_2023. The report. [EN].pdf

Powerful Google developer tools for immediate impact! (2023-24 C)

Real-Time Analytics with MemSQL and Spark

1. Neil Dahlke, Engineer 2016 November 4 Real-Time Analytics with MemSQL and Spark

2. About Me: Neil Dahlke  Engineer  MemSQL • real-time database for transactions / analytics  Formerly Globus • high performance data transfer for research scientists  Past talks • Real-time, Geospatial, Maps  Slides: http://www.slideshare.net/MemSQL/realtime-geospatial-maps- by-neil-dahlke

3. WHAT WE ARE SEEING A WORLD OF CONNECTED MACHINES AND PEOPLE

4. WHAT WE ARE SEEING: Sensors. Applications. Machines. And us. Generating more data every single day. By 2020, over 20 billion connected things will be in use across a range of industries.

5. REAL-TIME INPUTS LIVE OUTPUTS Sensors Logs Events Streaming Inserts Upserts Queries Dashboards Business Intelligence Applications Predict Analytics

6. WHAT DO REAL TIME BUSINESSES NEED? FAST DATA INGEST The volume of data that can be ingested into the database

7. WHAT DO REAL TIME BUSINESSES NEED? LOW LATENCY QUERIES The time it takes to execute queries and receive results

8. WHAT DO REAL TIME BUSINESSES NEED? HIGH CONCURRENCY The ability to scale simultaneous operations

9. WHAT DO REAL TIME BUSINESSES NEED? FAST DATA INGEST The volume of data that can be ingested into the database LOW LATENCY QUERIES The time it takes to execute queries and receive results HIGH CONCURRENCY The ability to scale simultaneous operations

10. REAL-TIME INPUTS LIVE OUTPUTS Sensors Logs Events Streaming Inserts Upserts Queries Dashboards Business Intelligence Applications Predict Analytics

11. A massively scalable database and ingest solution allowed for massive growth, real-time analytic applications and faster, targeted. +

12.  Kafka • Component we kept  S3 • Persisted all logs to cold storage for eventual analysis  Hadoop • Nighly map-reduce jobs  Redshift • Took a full day to load data from previous day • Reaching overlap of times caused data crisis Before

13.  No real time access to analytics  No SQL interface for analysts and data scientists  Massive nightly Hadoop batch jobs (late data)  Unfiltered and incomplete data (silos)  Expensive Why was this bad for their business operations?

14. Why was this bad for their data operations?  Too slow  Not scalable  No deduplication • aka not exactly-once  Low concurrency FAST DATA INGEST LOW LATENCY QUERIES HIGH CONCURRENCY

15. How It Works Now

16. After

17.

18.

19. TECHNICAL BENEFITS  Instant accuracy to the latest re-pin  1 GB/sec totaling 72 TB/day THE PINTEREST REAL-TIME ARCHITECTURE REAL-TIME ANALYTICS

20. Accelerated ingest time by 200,000x 1 GB/sec totaling 72 TB/day RESULTS

21.

22. Visualizing The Data

23. 23

24. Visualizing the Data  Demo built using • Mapbox • Websockets • Tornado web server  When an image is re pinned, the circles on the globe expand, showing higher volume areas  Reads data from MemSQL directly 24

25. DEMO 25

26. Questions?

27. More Info  http://www.odbms.org/blog/2015/04/powering-big-data-at- pinterest-interview-with-krishna-gade/  https://gigaom.com/2015/02/18/pinterest-is- experimenting-with-memsql-for-real-time-data-analytics/  https://www.infoq.com/news/2015/03/pinterest-memsql- spark-streaming  http://blog.memsql.com/pinterest-apache-spark-use-case/  https://engineering.pinterest.com/blog/real-time-analytics- pinterest

28. Resources  https://github.com/memsql/memsql-spark-connector  http://docs.memsql.com/docs/streamliner-administration  http://docs.memsql.com/docs/pipelines-overview  https://github.com/memsql/memsql-docker-quickstart

29. Thank You

Editor's Notes

- Distributed In-Memory Database - Built for real-time analytics and transactions Familiar SQL Interface Spark integration out-of-the-box - Native Kafka Ingestion What did they want to do? - highly scalable infrastructure that collects, stores and processes user engagement data in real-time higher performance event logging Reliable log transport and storage ability to query real-time data
user clicks Pin or repin event is pushed to Apache Kafka Storm, Spark and other custom built log readers process these events in real-time log persistence service called Secor that reliably writes these events to Amazon S3 (zero data loss, overcoming its weak eventual consistency model). self-serve big data platform loads the data from S3 into many different Hadoop clusters for batch processing In house tools Singer (logger) & Secor (replicator) asynchronously replicating local logs from app servers to centralized S3 location using Kafka for transport Kafka was great for throughput, but needed a way to derive value, e.g. run SQL against these datasets in real time A few days later this data would hit Redshift and be queryable
- took several days to access analytics and make available to data science team (too late, A/B testing, advertising) - no SQL Interface - 5.5 M rows / second for one topic, 1.7 M rows / second for another, with the lowest throughput being 132k rows / second - data needs to be filtered as well as enriched - At LEAST once semantics
user clicks Pin or repin event is pushed to Apache Kafka Storm, Spark and other custom built log readers process these events in real-time log persistence service called Secor that reliably writes these events to Amazon S3 (zero data loss, overcoming its weak eventual consistency model). self-serve big data platform loads the data from S3 into many different Hadoop clusters for batch processing In house tools Singer (logger) & Secor (replicator) asynchronously replicating local logs from app servers to centralized S3 location using Kafka for transport Kafka was great for throughput, but needed a way to derive value, e.g. run SQL against these datasets in real time A few days later this data would hit Redshift and be queryable
Goes both ways
easily repeatable success days to seconds now has a source of record for sharing relevant user engagement data and metrics their data analyst and with key brands Pinterest and their partners can get a better understanding of user behavior and provide more value to the Pinner community Cheaper the ability to identify (and react to) developing trends as they happen provides insight into how users are engaging with Pins across the globe in real-time helps Pinterest become a better recommendation engine- SQL interface for engineering and data science teams fast ad-hoc query execution on real-time data to allow the execution of SQL queries on the real-time events as they arrive
easily repeatable success days to seconds now has a source of record for sharing relevant user engagement data and metrics their data analyst and with key brands Pinterest and their partners can get a better understanding of user behavior and provide more value to the Pinner community Cheaper the ability to identify (and react to) developing trends as they happen provides insight into how users are engaging with Pins across the globe in real-time helps Pinterest become a better recommendation engine- SQL interface for engineering and data science teams fast ad-hoc query execution on real-time data to allow the execution of SQL queries on the real-time events as they arrive
Pull up Ops Pull up a terminal and create the database Deploy Spark Create a Streamliner pipeline Create a Pipeline pipeline Expose the UI Ad-Hoc queries, Tableau, and custom reporting

Real-Time Analytics with MemSQL and Spark

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Real-Time Analytics with MemSQL and Spark

Similar to Real-Time Analytics with MemSQL and Spark (20)

More from SingleStore

More from SingleStore (20)

Recently uploaded

Recently uploaded (20)

Real-Time Analytics with MemSQL and Spark

Editor's Notes