Converging Database Transactions and Analytics SingleStore
delivered at the Gartner Data and Analytics 2018 show in Texas. This presentation discusses real-time applications and their impact on existing data infrastructures
Converging Database Transactions and Analytics SingleStore
delivered at the Gartner Data and Analytics 2018 show in Texas. This presentation discusses real-time applications and their impact on existing data infrastructures
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...HostedbyConfluent
DataOps challenges us to build data experiences in a repeatable way. For those with Kafka, this means finding a means of deploying flows in an automated and consistent fashion.
The challenge is to make the deployment of Kafka flows consistent across different technologies and systems: the topics, the schemas, the monitoring rules, the credentials, the connectors, the stream processing apps. And ideally not coupled to a particular infrastructure stack.
In this talk we will discuss the different approaches and benefits/disadvantages to automating the deployment of Kafka flows including Git operators and Kubernetes operators. We will walk through and demo deploying a flow on AWS EKS with MSK and Kafka Connect using GitOps practices: including a stream processing application, S3 connector with credentials held in AWS Secrets Manager.
Winning the On-Demand Economy with Spark and Predictive AnalyticsSingleStore
Today’s on-demand economy drives companies to provide fast load times, personalization, and instantaneous service for hungry end-users across all types of applications. Yet most still use dated, legacy systems to process and analyze data. In this session, Ankur Goyal, VP of Engineering at MemSQL will showcase implementing a one-click Lambda Architecture with Apache Spark, Apache Kafka and an operational database, resulting in lightning fast analytics on large, changing datasets.
The database market is large and filled with many solutions. In this talk, Seth Luersen from MemSQL we will take a look at what is happening within AWS, the overall data landscape, and how customers can benefit from using MemSQL within the AWS ecosystem.
Building the Next-gen Digital Meter Platform for FluviusDatabricks
Fluvius is the network operator for electricity and gas in Flanders, Belgium. Their goal is to modernize the way people look at energy consumption using a digital meter that captures consumption and injection data from any electrical installation in Flanders ranging from households to large companies. After full roll-out there will be roughly 7 million digital meters active in Flanders collecting up to terabytes of data per day. Combine this with regulation that Fluvius has to maintain a record of these reading for at least 3 years, we are talking petabyte scale. delaware BeLux was assigned by Fluvius to setup a modern data platform and did so on Azure using Databricks as the core component to collect, store, process and serve these volumes of data to every single consumer and beyond in Flanders. This enables the Belgian energy market to innovate and move forward. Maarten took up the role as project manager and solution architect.
CTO View: Driving the On-Demand Economy with Predictive AnalyticsSingleStore
In the on-demand economy real-time analytics is both a necessity and a competitive advantage. The next evolution in the on-demand economy is in predictive analytics fueled by live streams of data—in effect knowing what customers want before they do. This session will feature technical examples of real-time pipelines, machine learning, and custom dashboards as well as off-the-shelf dashboards with Tableau.
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...HostedbyConfluent
DataOps challenges us to build data experiences in a repeatable way. For those with Kafka, this means finding a means of deploying flows in an automated and consistent fashion.
The challenge is to make the deployment of Kafka flows consistent across different technologies and systems: the topics, the schemas, the monitoring rules, the credentials, the connectors, the stream processing apps. And ideally not coupled to a particular infrastructure stack.
In this talk we will discuss the different approaches and benefits/disadvantages to automating the deployment of Kafka flows including Git operators and Kubernetes operators. We will walk through and demo deploying a flow on AWS EKS with MSK and Kafka Connect using GitOps practices: including a stream processing application, S3 connector with credentials held in AWS Secrets Manager.
Winning the On-Demand Economy with Spark and Predictive AnalyticsSingleStore
Today’s on-demand economy drives companies to provide fast load times, personalization, and instantaneous service for hungry end-users across all types of applications. Yet most still use dated, legacy systems to process and analyze data. In this session, Ankur Goyal, VP of Engineering at MemSQL will showcase implementing a one-click Lambda Architecture with Apache Spark, Apache Kafka and an operational database, resulting in lightning fast analytics on large, changing datasets.
The database market is large and filled with many solutions. In this talk, Seth Luersen from MemSQL we will take a look at what is happening within AWS, the overall data landscape, and how customers can benefit from using MemSQL within the AWS ecosystem.
Building the Next-gen Digital Meter Platform for FluviusDatabricks
Fluvius is the network operator for electricity and gas in Flanders, Belgium. Their goal is to modernize the way people look at energy consumption using a digital meter that captures consumption and injection data from any electrical installation in Flanders ranging from households to large companies. After full roll-out there will be roughly 7 million digital meters active in Flanders collecting up to terabytes of data per day. Combine this with regulation that Fluvius has to maintain a record of these reading for at least 3 years, we are talking petabyte scale. delaware BeLux was assigned by Fluvius to setup a modern data platform and did so on Azure using Databricks as the core component to collect, store, process and serve these volumes of data to every single consumer and beyond in Flanders. This enables the Belgian energy market to innovate and move forward. Maarten took up the role as project manager and solution architect.
CTO View: Driving the On-Demand Economy with Predictive AnalyticsSingleStore
In the on-demand economy real-time analytics is both a necessity and a competitive advantage. The next evolution in the on-demand economy is in predictive analytics fueled by live streams of data—in effect knowing what customers want before they do. This session will feature technical examples of real-time pipelines, machine learning, and custom dashboards as well as off-the-shelf dashboards with Tableau.
Real-time Data Integration with Kafka and Cassandra (Ewen Cheslack-Postava, C...DataStax
Apache Kafka is a high throughput messaging system that companies like LinkedIn, Netflix, and AirBnB are adopting to handle massive real-time datasets. These datasets originate from dozens of systems -- from databases like Cassandra, to log files, to application data. And companies often need to adopt just as many tools to integrate that data for processing. This presentation introduces Kafka Connect, Kafka's new tool for scalable, fault-tolerant data import and export. We'll discuss existing tools in the space and how they fall short when applied to real-time data integration at scale. Then we'll explore Kafka Connect's design and how it compares to systems with similar goals, including key design decisions and tradeoffs. Finally, we'll discuss the current support for Cassandra connectors and how they can be combined with other connectors and stream processing frameworks to help you get more out of your data.
About the Speaker
Ewen Cheslack-Postava Engineer, Confluent, Inc.
Ewen Cheslack-Postava is a Kafka committer and engineer at Confluent building a stream data platform based on Apache Kafka to help organizations reliably and robustly capture and leverage all their real-time data.
Confluent building a real-time streaming platform using kafka streams and k...Thomas Alex
Jeremy Custenborder from Confluent talked about how Kafka brings an event-centric approach to building streaming applications, and how to use Kafka Connect and Kafka Streams to build them.
Protecting your data at rest with Apache Kafka by Confluent and Vormetricconfluent
Learn how data in motion is secure within Apache Kafka and the broader Confluent Platform, while data at rest can be secured by solutions like Vormetric Data Security Manager.
"Building Real-Time Data Pipelines with Kafka and MemSQL" by Rick Negrin, Director of Product Management at MemSQL for Orange County Roadshow March 17, 2017.
Building a real-time pipeline from scratch that is able to handle billion+ transactions per day, store, analyze and visualize it all in real-time has never been easier. In this build-as-we-go talk, we’ll create a front-to-back architecture that does exactly that.
* we’ll start with a simple producer emitting a few messages and publishing them onto a Kafka queue
* on consuming end of the queue a Spark-based Streamliner process will pick them up and store in MemSQL
* ZoomData will connect to MemSQL for real-time visualization where we’ll be able to ask various questions and see answers change as data is flowing through the system
* we’ll quickly make the entire pipeline more complex by increasing the amount of data as well as complexity of the data, until reaching 100K transactions per second
As we walk through this demo, we will touch on cross data-center Kafka and MemSQL set-ups, speed limitations if any as well as echo back to real-life use cases of a similar set-up used in Goldman’s Asset Management division for the purposes of Portfolio Management & Trading.
In this presentation we describe the design and implementation of Kafka Connect, Kafka’s new tool for scalable, fault-tolerant data import and export. First we’ll discuss some existing tools in the space and why they fall short when applied to data integration at large scale. Next, we will explore Kafka Connect’s design and how it compares to systems with similar goals, discussing key design decisions that trade off between ease of use for connector developers, operational complexity, and reuse of existing connectors. Finally, we’ll discuss how standardizing on Kafka Connect can ultimately lead to simplifying your entire data pipeline, making ETL into your data warehouse and enabling stream processing applications as simple as adding another Kafka connector.
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't ChangedRaffael Marty
We are writing the year 2017. Cyber security has been a discipline for many years and thousands of security companies are offering solutions to deter and block malicious actors in order to keep our businesses operating and our data confidential. But fundamentally, cyber security has not changed during the last two decades. We are still running Snort and Bro. Firewalls are fundamentally still the same. People get hacked for their poor passwords and we collect logs that we don't know what to do with. In this talk I will paint a slightly provocative and dark picture of security. Fundamentally, nothing has really changed. We'll have a look at machine learning and artificial intelligence and see how those techniques are used today. Do they have the potential to change anything? How will the future look with those technologies? I will show some practical examples of machine learning and motivate that simpler approaches generally win. Maybe we find some hope in visualization? Or maybe Augmented reality? We still have a ways to go.
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
(Marcus Urbatschek, Confluent)
Presentation during Confluent’s streaming event in Munich. This three-day hands-on course focused on how to build, manage, and monitor clusters using industry best-practices developed by the world’s foremost Apache Kafka™ experts. The sessions focused on how Kafka and the Confluent Platform work, how their main subsystems interact, and how to set up, manage, monitor, and tune your cluster.
Cloud Experience: Data-driven Applications Made Simple and FastDatabricks
A complex real-time data workflow implementation is very challenging. This session will describe the architecture of a data platform that provides a single, secure, high-performance system that can be deployed in a hybrid cloud architectures. We will present how to support simultaneous, consistent and high-performance access through multiple industry open source and cloud compatible standards of streaming, table, TSDB, object, and file APIs. A new serverless technology is also used in the architecture to support a dynamic and flexible implementations. The presenter will also outline how the platform was integrated with the Spark eco-system, including AI and ML tools, to simplify the development process
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertconfluent
Für die Automobilindustrie ist die digitale Transformation wie für jede andere Branche zugleich eine digitale Revolution: Neue Marktspieler, neue Technologien und die in immer größeren Mengen anfallenden Daten schaffen neue Chancen, aber auch neue Herausforderungen – und erfordern neben neuen IT-Architekturen auch völlig neue Denkansätze.
60% der Fortune500-Unternehmen setzen zur Umsetzung ihrer Daten-Streaming-Projekte auf die umfassende verteilte Streaming-Plattform Apache Kafka®, darunter auch die AUDI AG.
Erfahren Sie in diesem Webinar:
Wie Kafka als Grundlage sowohl für Daten-Pipelines als auch für Anwendungen dient, die Echtzeit-Datenströme konsumieren und verarbeiten.
Wie Kafka Connect und Kafka Streams geschäftskritische Anwendungen unterstützt
Wie Audi mithilfe von Kafka und Confluent eine Fast Data IoT-Plattform umgesetzt hat, die den Bereich „Connected Car“ revolutioniert
Sprecher:
David Schmitz, Principal Architect, Audi Electronics Venture GmbH
Kai Waehner, Technology Evangelist, Confluent
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big DataVoltDB
This webinar with Chris Selland of HPE Vertica and Dennis Duckworth of VoltDB addresses the growing challenges with managing a complex IoT solution and how to enable real-time operational interaction with comprehensive data analytics.
EDA Meets Data Engineering – What's the Big Deal?confluent
Presenter: Guru Sattanathan, Systems Engineer, Confluent
Event-driven architectures have been around for many years, much like Apache Kafka®, which first open sourced in 2011. The reality is that the true potential of Kafka is only being realised now. Kafka is becoming the central nervous system of many of today’s enterprises. It is bringing a profound paradigm shift to the way we think about enterprise IT. What has changed in Kafka to enable this paradigm shift? Is it not just a message broker, and how are enterprises using it today? This session will explore these key questions.
Sydney: https://content.deloitte.com.au/20200221-tel-event-tech-community-syd-registration
Melbourne: https://content.deloitte.com.au/20200221-tel-event-tech-community-mel-registration
Best Practices for Streaming IoT Data with MQTT and Apache KafkaKai Wähner
Organizations today are looking to stream IoT data to Apache Kafka. However, connecting tens of thousands or even millions of devices over unreliable networks can create some architecture challenges. In this session, we will identify and demo some best practices for implementing a large scale IoT system that can stream MQTT messages to Apache Kafka.
We use HiveMQ as open source MQTT broker to ingest data from IoT devices, ingest the data in real time into an Apache Kafka cluster for preprocessing (using Kafka Streams / KSQL), and model training + inference (using TensorFlow 2.0 and its TensorFlow I/O Kafka plugin).
We leverage additional enterprise components from HiveMQ and Confluent to allow easy operations, scalability and monitoring.
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...Grid Dynamics
This presentation outlines key business drivers for real-time analytics applications in retail and describes the emerging architectures based on In-Stream Processing (ISP) technologies. The slides present a complete open blueprint for an ISP platform - including a demo application for real-time Twitter Sentiment Analytics - designed with 100% open source components and deployable to any cloud.
To learn more, read an adjoining blog series on this topic here : https://blog.griddynamics.com/in-stream-processing-service-blueprint
Flexible and Scalable Integration in the Automation Industry/Industrial IoTconfluent
Speaker: Kai Waehner, Technology Evangelist, Confluent
Kafka-Native, End-to-End IIoT Data Integration and Processing with Kafka Connect, KSQL, and PLC4X
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X Kai Wähner
Data integration and processing is a huge challenge in Industrial IoT (IIoT, aka Industry 4.0 or Automation Industry) due to monolithic systems and proprietary protocols. Apache Kafka, its ecosystem (Kafka Connect, KSQL) and Apache PLC4X are a great open source choice to implement this integration end to end in a scalable, reliable and flexible way.
This blog post covers a high level overview about the challenges and a good, flexible architecture. At the end, I share a video recording and the corresponding slide deck. These provide many more details and insights.
Apache Kafka is the De-facto Standard for Real-Time Event Streaming. It provides
Open Source (Apache 2.0 License)
Global-scale
Real-time
Persistent Storage
Stream Processing
PCL4X allows vertical integration and to write software independent of PLCs using JDBC-like adapters for various protocols like Siemens S7, Modbus, Allen Bradley, Beckhoff ADS, OPC-UA, Emerson, Profinet, BACnet, Ethernet.
Github example: https://github.com/kaiwaehner/iiot-integration-apache-plc4x-kafka-connect-ksql-opc-ua-modbus-siemens-s7
More details: http://www.kai-waehner.de/blog/2019/09/02/iiot-data-integr…and-apache-plc4x/
Video Recording: https://youtu.be/RWKggid25ds
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingKai Wähner
Business digitalization trends like microservices, the Internet of Things or Machine Learning are driving the need to process events at a whole new scale, speed and efficiency. Traditional solutions like ETL/data integration or messaging are not build to serve these needs.
Today, the open source project Apache Kafka® is being used by thousands of companies including over 60% of the Fortune 100 to power and innovate their businesses by focusing their data strategies around event-driven architectures leveraging event streaming.We will discuss the market and technology changes that have given rise to Kafka and to Event Streaming, and we will introduce the audience to the key aspects of building an Event streaming platform with Kafka. Examples of productive use cases from the automotive, manufacturing and transportation sector will showcase the power of event streaming.
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
Full review 04.2020 about Azure Data Explorer service. Slide Desk is a sort of review od Kusto, in terms of usage, ingestion techniques, querying and exporting data, using anomaly detection and clustering methods.
What is Innovation? How can cloud computing help you innovate? How can you make your applications smarter? Predictive? How can you interpret data and anticipate trends? With AWS Artificial Intelligence Solutions: Machine Learning, Rekognition, Polly; with serverless - Lambda, Step Functions.
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®confluent
Watch this talk here: https://www.confluent.io/online-talks/best-practices-for-streaming-iot-data-with-MQTT-and-apache-kafka-on-demand
Organizations today are looking to stream IoT data to Apache Kafka. However, connecting tens of thousands or even millions of devices over unreliable networks can create some architecture challenges.
In this session, we will identify and demo some best practices for implementing a large scale IoT system that can stream MQTT messages to Apache Kafka.
You Can't Protect What you Can't See. AWS Security Best Practices - Session S...Amazon Web Services
AWS utilises a shared security model where both AWS and the customer share responsibility for the security of data, applications and resources. As part of this model, it is critical that customers leverage services such as AWS CloudTrail, Config, and more. Attend this session to learn best practices on how to leverage these and other AWS services to gain end-to-end visibility and robust security on AWS. You will also hear how customers leverage third-party tools such as the Splunk App for AWS as critical elements of their security posture.
Speakers: Dan Miller, Cloud Sales Director, APAC, Splunk & Simon O'Brien, Senior Systems Engineer, Splunk
Similar to Real-Time Analytics with Confluent and MemSQL (20)
MemSQL 201: Advanced Tips and Tricks WebcastSingleStore
Topics discussed include differences between columnstore and rowstore engines, data ingestion, data sharding and query tuning, lastly memory and workload management.
Watch the replay at https://memsql.wistia.com/medias/4siccvlorm
An Engineering Approach to Database EvaluationsSingleStore
This talk will go over a methodical approach for making a decision, dig into interesting tradeoffs, and give tips about what things to look for under the hood and how to evaluate the tech behind the database.
Building a Fault Tolerant Distributed ArchitectureSingleStore
This talk will highlight some of the challenges to building a fault tolerant distributed architecture, and how MemSQL's architecture tackles these challenges.
Stream Processing with Pipelines and Stored ProceduresSingleStore
This talk will discuss an upcoming feature in MemSQL 6.5 showing how advanced stream processing use cases can be tackled with a combination of stored procedures (new in 6.0) and MemSQL's pipelines feature.
Learn how to leverage MPP technology and distributed data to deliver high volume transactional and analytical work loads which result in real time dashboards on rapidly changing data using standard SQL tools. Demonstrations will include the streaming of structured and JSON data from Kafka messages through a micro-batch ETL process into the MemSQL database where the data is then queried using standard SQL tools and visualized leveraging Tableau.
This session will focus on image recognition, the techniques available, and how to put those techniques into production. It will further explore algebraic operations on tensors, and how that can assist in large-scale, high-throughput, highly-parallel image recognition.
LIVE DEMO: Constructing and executing a real-time image recognition pipeline using Kafka and Spark.
Speaker: Neil Dahlke, MemSQL Senior Solutions Engineer
How Database Convergence Impacts the Coming Decades of Data ManagementSingleStore
How Database Convergence Impacts the Coming Decades of Data Management by Nikita Shamgunov, CEO and co-founder of MemSQL.
Presented at NYC Database Month in October 2017. NYC Database Month is the largest database meetup in New York, featuring talks from leaders in the technology space. You can learn more at http://www.databasemonth.com.
James Burkhart explains how Uber supports millions of analytical queries daily across real-time data with Apollo. James covers the architectural decisions and lessons learned building an exactly-once ingest pipeline storing raw events across in-memory row storage and on-disk columnar storage and a custom metalanguage and query layer leveraging partial OLAP result set caching and query canonicalization. Putting all the pieces together provides thousands of Uber employees with subsecond p95 latency analytical queries spanning hundreds of millions of recent events.
Machines and the Magic of Fast LearningSingleStore
Human-machine interaction is no longer the exclusive province of science fiction. The advance of the internet and connected devices has inspired data scientists to create machine-learning applications to extract value from these new forms of data.
So what's the next frontier?
Join MemSQL Engineer Michael Andrews and Sr. Director Mike Boyarski to learn how to use real-time data as a vehicle for operationalizing machine-learning models. Michael and Mike will explore advanced tools, including TensorFlow, Apache Spark, and Apache Kafka, and compelling use cases demonstrating the power of machine learning to effect positive change.
You will learn:
Top technologies for building the ideal machine-learning stack
How to power machine-learning applications with real-time data
A use case and demo of machine learning for social good
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingSingleStore
Robin Li, Director of Data Engineering and Yohan Chin, VP Data Science at Tapjoy share how to architect the best application experience for mobile users using technologies including Apache Kafka, Apache Spark, and MemSQL.
Speaker: Robin Li - Director of Data Engineering, Tapjoy and Yohan Chin - VP Data Science, Tapjoy
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
7. 7
About Confluent and Apache Kafka
• Founded by the creators of Apache Kafka
• Founded September 2014
• Technology developed while at LinkedIn
• 73% of active Kafka committers
Cheryl
Dalrymple
CFO
Jay
Kreps
CEO
Neha
Narkhede
CTO, VP
Engineering
Luanne
Dauber
CMO
Leadership
Todd
Barnett
VP WW Sales
Jabari
Norton
VP Business
Dev
8. 8
What is a Stream Data Platform?
KAFKA
Stream Data
Platform
Search
NoSQL
RDBMS Monitoring
Stream ProcessingReal-time Analytics Data Warehouse
Apps
Apps
Hadoop
Synchronous Req/Response
0 – 100s ms
Near Real Time
> 100s ms
Offline Batch
> 1 hour
Build streaming applications
Deploy streaming applications at scale
Monitor and manage streaming applications
Common Kafka Use Cases
• Log data
• Database changes
• Sensors and device data
• Monitoring streams
• Call data records
• Monitoring
• Asynchronous
applications
• Fraud and security
12. Fast, Performant Data Storage
Data
Transformation
User
Interface
Architecting for Real-Time Analytics
Database
Message
Queue
Data
Producers
(simulating
sensor activity)
gateway
gateway
...
gateway
14. 14
Designed for Modern Operational Workloads
Scalable SQL
In-Memory
and
Solid-State
Distributed Datacenter or Cloud
▪ Multi-mode
▪ OLTP, OLAP, HTAP
▪ Multi-model
▪ ANSI SQL
▪ Document/JSON
▪ Geospatial
▪ In-Memory rowstore
▪ Solid-state columnstore
▪ Stream directly to rowstore
or columnstore
▪ Distributed query optimizer
and execution
▪ Scale-out on commodity
hardware
▪ Deploy on-premises
▪ Cloud agnostic
▪ Amazon
▪ Microsoft
▪ Google
▪ Digital Ocean
Simple Real-Time Low Cost Flexible
SSD
15. 15
Real-Time Processing Features
▪ Ecosystem Compatibility
• MySQL Wire Protocol
• Stream processing through Integrated Apache Spark
▪ In-Memory Performance
• Code Compilation for SQL queries
• Maximum Concurrency with Lock-free components
• Full Data Durability and High Availability
▪ Distributed System Processing
• Distributed Database Joins
• Distributed Query Optimizer
▪ Multi-mode and Multi-model data
• In-Memory Rowstore and Flash/SSD Columnstore
• SQL, JSON and Geospatial data
16. ▪ MemSQL Streamliner is an integrated MemSQL and Apache Spark solution
▪ Deploys Apache Spark with one click
▪ Creates real-time data pipelines through a graphical UI
▪ Open sourced on GitHub at memsql.github.io/spark-streamliner
Real-Time
Application
Real-Time
Inputs
16
Real-Time Data Pipelines with Spark
STREAMLINER
Apache Spark
Extract, Transform, Load
17. Orchestration / Containers
Cloud / On-Premises Platform
MessagingInputs Real-Time Applications
Business Intelligence
Dashboards
Relational Key-Value Document Geospatial
Existing Data Stores
Rowstore
Columnstore
Real-Time
Data Pipelines
Hadoop Amazon S3MySQL
17
MemSQL Ecosystem and Architecture
21. MemEx: IoT Showcase Application
- Combines MemSQL, Apache Kafka,
and Spark for global supply chain
management
- Enables enterprises to predict
throughput of supply warehouses
- Processes 2 million data points, based
on 2,000 sensors across 1,000
warehouses
28. 28
Real-time drilling sensor data to manage the high stakes of
producing oil in a depressed market and maximizing productivity.
+ Top Energy Firm
28
29. TECHNICAL BENEFITS
- Enabled machine learning scoring of streaming data for real-time
Predictive Analytics
- Integrated SAS BI PMML for deep analytics
- Joined multiple data types and third party sources including
geospatial and weather data
29
30. 30
Spark MLlib Predictive Model
REAL-TIME
INPUTS
Streamliner
Raw Sensor 1 + Predictive Score 1
S1 P1
1
BUSINESS
LOGIC
31. Continued Rise of IoT
31
Sensor Array
PoS Systems
Connected Fleets
Mobile Apps
Security
Reporting Systems
Log Systems
Data Lake
Data Warehouse
Databases
“By 2020, over 20 billion connected things will be in use across a
range of industries; the IoT will touch every role across the enterprise.”
Source: Gartner
32. 32
“These are highly automated drones. They have what is
called sense-and-avoid technology. That means, basically,
seeing and then avoiding obstacles.”
Yahoo, January 2016: https://www.yahoo.com/tech/exclusive-amazon-reveals-details-about-1343951725436982.html
32
Amazon Invests in Drones for 30 Minute
Post-Order Deliveries
33. 33
Fedex Breaks Record With 317 Million
Packages Shipped Over Christmas 2015
“FedEx Ground continues to advance the industry’s most
automated hub network with investments in package sortation
systems that enable flexible and reliable operations and
six-sided scanning tunnels that boost data and image capture.”
FedEx, October 2015: http://about.van.fedex.com/newsroom/global-english/fedex-forecasts-record-volume-this-holiday-season/
33
34. The Evolution of Data Analytics
34
Descriptive Analytics Predictive AnalyticsReal-Time Analytics