The document discusses several problems that are created by deep learning related to data operations and logistics, including a lack of support for the AI software development lifecycle, handling different workloads beyond just deep learning models, difficulties in putting machine learning models into production, running models in multiple locations, data dependencies being more costly than code dependencies, and changes in conditions over time. It provides recommendations on how to address these problems through approaches like stream-based architectures and containerization.
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1TigerGraph
Full Webinar: https://info.tigergraph.com/graph-gurus-26
Have you ever wondered how routing apps like Google Maps find the best route from one place to another? Finding that route is solved by the Shortest Path graph algorithm. Today, graph algorithms are moving from the classroom to a host of important and valuable operational and analytical applications. This webinar will give you an overview of graph algorithms, how to use them, and the categories of problems they can solve, and then take a closer look at path algorithms. This webinar is the first part in a five-part series, each part examining a different type of problem to be solved.
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2TigerGraph
Full Webinar: https://info.tigergraph.com/graph-gurus-27
What does finding the best location for a warehouse/office/retail store have in common with finding the most influential person in a referral network? Answer: they are both Centrality problems and can be solved with graph algorithms. Join us for Part 2 of our five-part webinar series on using graph algorithms for advanced analytics.
By attending this webinar you will:
- Hear about use cases for centrality graph algorithms
- Learn how to select the right algorithm for your use case
- Be able to run and tailor GSQL graph algorithms
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...TigerGraph
Full Webinar: https://info.tigergraph.com/graph-gurus-25
A new weapon is available for businesses wanting to accomplish more with Hadoop: native parallel graphs can reveal the connections across multiple domains and datasets in data lakes and provide powerful insights to deliver superior outcomes. In this webinar we will explain how native parallel graphs can analyze the information in data lakes to enable the following outcomes:
Recommending next best actions such as promoting a student loan to someone heading off to college, advocating life insurance to a newly married couple, and so on
Improving network utilization by analyzing petabytes of data collected from millions of IoT devices across a smart grid
Accelerating M&A activity by intelligently merging data lakes from multiple businesses.
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...TigerGraph
Graph-based investigation often enables us to identify individuals who are of special interest, and their uniqueness is due in part to their pattern of interactions. For example:
-A patient whose carepath journey leverages best-practices gained from using pattern matching algorithms that find similar issues among the data of 50 million patients
-An individual who builds a successful portfolio by implementing actions recommended by similarity algorithms that find equivalent actions by successful investors
-A participant in a criminal ring whose attempts at swindling are blocked by matching them to patterns of known fraudulent activity
Once you have identified such a pattern and a key individual, you want to search your data for similar occurrences. Similarity algorithms are the answer.
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityTigerGraph
What does finding the best location for a warehouse/office/retail store have in common with finding the most influential person in a referral network? Answer: they are both Centrality problems and can be solved with graph algorithms.
Using Graph Algorithms for Advanced Analytics - Part 5 ClassificationTigerGraph
What atmospheric data will help you predict if it's going to rain, snow, or be windy? What position should that new athlete play? How well can you guess a person's demographic background, based on their chat activity? These are all classification problems -- trying to pick the right category or label for an entity, based on observable features. They can also be solved with machine learning.
Graph Gurus Episode 17: Seven Key Data Science Capabilities Powered by a Nati...TigerGraph
This webinar will demonstrate seven key data science capabilities using TigerGraph’s intuitive GUI, GraphStudio and GSQL queries. In this episode, we:
-Share the capabilities and tie those to specific use cases across healthcare, pharmaceutical, financial services, Telecom, Internet and government industries.
-Walk you through a sample dataset, GraphStudio UI flow, and GSQL queries demonstrating the capabilities.
-Cover client case studies for Amgen, Intuit, China Mobile, Santa Clara County, and other enterprise customers
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1TigerGraph
Full Webinar: https://info.tigergraph.com/graph-gurus-26
Have you ever wondered how routing apps like Google Maps find the best route from one place to another? Finding that route is solved by the Shortest Path graph algorithm. Today, graph algorithms are moving from the classroom to a host of important and valuable operational and analytical applications. This webinar will give you an overview of graph algorithms, how to use them, and the categories of problems they can solve, and then take a closer look at path algorithms. This webinar is the first part in a five-part series, each part examining a different type of problem to be solved.
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2TigerGraph
Full Webinar: https://info.tigergraph.com/graph-gurus-27
What does finding the best location for a warehouse/office/retail store have in common with finding the most influential person in a referral network? Answer: they are both Centrality problems and can be solved with graph algorithms. Join us for Part 2 of our five-part webinar series on using graph algorithms for advanced analytics.
By attending this webinar you will:
- Hear about use cases for centrality graph algorithms
- Learn how to select the right algorithm for your use case
- Be able to run and tailor GSQL graph algorithms
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...TigerGraph
Full Webinar: https://info.tigergraph.com/graph-gurus-25
A new weapon is available for businesses wanting to accomplish more with Hadoop: native parallel graphs can reveal the connections across multiple domains and datasets in data lakes and provide powerful insights to deliver superior outcomes. In this webinar we will explain how native parallel graphs can analyze the information in data lakes to enable the following outcomes:
Recommending next best actions such as promoting a student loan to someone heading off to college, advocating life insurance to a newly married couple, and so on
Improving network utilization by analyzing petabytes of data collected from millions of IoT devices across a smart grid
Accelerating M&A activity by intelligently merging data lakes from multiple businesses.
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...TigerGraph
Graph-based investigation often enables us to identify individuals who are of special interest, and their uniqueness is due in part to their pattern of interactions. For example:
-A patient whose carepath journey leverages best-practices gained from using pattern matching algorithms that find similar issues among the data of 50 million patients
-An individual who builds a successful portfolio by implementing actions recommended by similarity algorithms that find equivalent actions by successful investors
-A participant in a criminal ring whose attempts at swindling are blocked by matching them to patterns of known fraudulent activity
Once you have identified such a pattern and a key individual, you want to search your data for similar occurrences. Similarity algorithms are the answer.
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityTigerGraph
What does finding the best location for a warehouse/office/retail store have in common with finding the most influential person in a referral network? Answer: they are both Centrality problems and can be solved with graph algorithms.
Using Graph Algorithms for Advanced Analytics - Part 5 ClassificationTigerGraph
What atmospheric data will help you predict if it's going to rain, snow, or be windy? What position should that new athlete play? How well can you guess a person's demographic background, based on their chat activity? These are all classification problems -- trying to pick the right category or label for an entity, based on observable features. They can also be solved with machine learning.
Graph Gurus Episode 17: Seven Key Data Science Capabilities Powered by a Nati...TigerGraph
This webinar will demonstrate seven key data science capabilities using TigerGraph’s intuitive GUI, GraphStudio and GSQL queries. In this episode, we:
-Share the capabilities and tie those to specific use cases across healthcare, pharmaceutical, financial services, Telecom, Internet and government industries.
-Walk you through a sample dataset, GraphStudio UI flow, and GSQL queries demonstrating the capabilities.
-Cover client case studies for Amgen, Intuit, China Mobile, Santa Clara County, and other enterprise customers
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
DataOps expands DevOps philosophy to include data-heavy roles (data engineering & data science). DataOps uses better cross-functional collaboration for flexibility, fast time to value and an agile workflow for data-intensive applications including machine learning pipelines. (Strata Data San Jose March 2018)
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...Matt Stubbs
Date: 14th November 2018
Location: Keynote Theatre
Time: 11:50 - 12:20
Speaker: Ellen Friedman
Organisation: MapR
About: We’ve seen that over 90% of our customers have large scale projects successfully in production. What are they doing right? And how can you adapt their effective habits to your own business?
Value comes from big data when you have successful production deployments of data-intensive AI and analytics applications tied to practical business goals. Doing this well can be difficult on many levels. Each business presents its own challenges, but we’ve observed a number of habits that are common to many of the organizations who are getting value from their production deployments.
This presentation will explore 7 key habits that can make a difference and use real world examples to show you why. From architecture to technology to organizational culture, you’ll learn practical approaches that can improve your likelihood of success in production.
Surprising Advantages of Streaming - ACM March 2018Ellen Friedman
Shift to a new idea: stream instead of database as heart of your big data architecture. With the right capabilities for event-by-event streaming data transport (not processing) you get the flexibility of streaming microservices & much more. Includes real world use case examples.
7 Habits for Big Data in Production - keynote Big Data London Nov 2018Ellen Friedman
You can improve your chances for success with data intensive large scale applications (AI, machine learning and analytics) in production.
This keynote presentation from Big Data London shows you how.
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
Join Ellen Friedman, co-author (with Ted Dunning) of a new short O’Reilly book Machine Learning Logistics: Model Management in the Real World, to look at what you can do to have effective model management, including the role of stream-first architecture, containers, a microservices approach and a DataOps style of work. Ellen will provide a basic explanation of a new architecture that not only leverages stream transport but also makes use of canary models and decoy models for accurate model evaluation and for efficient and rapid deployment of new models in production.
Graph Databases and Machine Learning | November 2018TigerGraph
Graph Database and Machine Learning: Finding a Happy Marriage. Graph Databases and Machine Learning
both represent powerful tools for getting more value from data, learn how they can form a harmonious marriage to up-level machine learning.
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
Big data technologies are being applied to a wide variety of use cases. We will review tangible examples of machine learning, discuss an autonomous driving project and illustrate the role of MapR in next generation initiatives. More: http://info.mapr.com/WB_Machine-Learning-for-Chickens_Global_DG_17.11.02_RegistrationPage.html
Designing data pipelines for analytics and machine learning in industrial set...DataWorks Summit
Machine learning has made it possible for technologists to do amazing things with data. Its arrival coincides with the evolution of networked manufacturing systems driven by IoT. In this presentation we’ll examine the rise of IoT and ML from a practitioners perspective to better understand how applications of AI can be built in industrial settings. We'll walk through a case study that combines multiple IoT and ML technologies to monitor and optimize an industrial heating and cooling HVAC system. Through this instructive example you'll see how the following components can be put into action:
1. A StreamSets data pipeline that sources from MQTT and persists to OpenTSDB
2. A TensorFlow model that predicts anomalies in streaming sensor data
3. A Spark application that derives new event streams for real-time alerts
4. A Grafana dashboard that displays factory sensors and alerts in an interactive view
By walking through this solution step-by-step, you'll learn how to build the fundamental capabilities needed in order to handle endless streams of IoT data and derive ML insights from that data:
1. How to transport IoT data through scalable publish/subscribe event streams
2. How to process data streams with transformations and filters
3. How to persist data streams with the timeliness required for interactive dashboards
4. How to collect labeled datasets for training machine learning models
At the end of this presentation you will have learned how a variety of tools can be used together to build ML enhanced applications and data products for instrumented manufacturing systems.
Speakers
Ian Downard, Sr. Developer Evangelist, MapR
William Ochandarena, Senior Director of Product Management, MapR
Cheryl Wiebe - Advanced Analytics in the Industrial WorldRehgan Avon
2018 Women in Analytics Conference
https://www.womeninanalytics.org/
Cheryl will talk about her consulting practice in Industrial Solutions, Analytic solutions for industrial IoT-enabled businesses, including connected factory, connected supply chain, smart mobility, connected assets. Her path to this practice has bounced between hands on systems development, IT strategy, business process reengineering, supply chain analytics, manufacturing quality analytics, and now Industrial IoT analytics. She spent time working in industry as a developer, as a management consultant, started and sold a company, before settling in to pursue this topic as a career analytics consultant. Cheryl will shed light on what's happening in industrial companies struggling to make the transition to digital, what that means, and what barriers they're challenged with. She'll touch on how/where artificial intelligence, deep learning, and machine learning technologies are being used most effectively in industrial companies, and what are the unique challenges they are facing. Reflecting on what's changed over the years, and her journey to witness this, Cheryl will pose what she considers important ideas to consider for women (and men) in pursuing an analytics career successfully and meaningfully.
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
My presentation from AnacondaCON 2018 where I discussed using Recurrent Neural Networks, Python, Tensorflow and the MapR Platform to develop deploy a predictive maintenance model for an IoT device in the manufacturing industry.
The logistics of machine learning typically take waaay more effort than the machine learning itself. Moreover, machine learning systems aren't like normal software projects so continuous integration takes on new meaning.
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive
Think Tank Event 10/23/2017, hosted by The Hive and presented by Ted Dunning, Chief Application Architect of MapR Technologies and Ellen Friedman of MapR Technologies.
MapR is an ideal scalable platform for data science and specifically for operationalizing machine learning in the enterprise. This presentations gives specific reasons why.
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsGreg Makowski
This is a first presentation of Kamanja, a new open-source real-time software product, which integrates with other big-data systems. See also links: http://www.meetup.com/SF-Bay-ACM/events/223615901/ and http://Kamanja.org to download, for docs or community support. For the YouTube video, see https://www.youtube.com/watch?v=g9d87rvcSNk (you may want to start at minute 33).
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
MapR has launched the MapR Data Science Refinery which leverages a scalable data science notebook with native platform access, superior out-of-the-box security, and access to global event streaming and a multi-model NoSQL database.
In memory computing principles by Mac Moore of GridGainData Con LA
In the presentation, we will provide an overview of general in-memory computing principles and the drivers behind it. We will start with a summary of the technical drivers (abundant hardware resources) and market forces (the rise of Big Data). We will cover popular and emerging use cases for in-memory computing, from financial industry trading platforms to mobile payment processing, online advertising, online/mobile gaming back-ends and more. We will then present some foundational concepts and terminology, and discuss considerations around any in-memory solution. From there, we will illustrate how a complete in-memory computing stack like GridGain combines clustering, high performance computing, in-memory data grids, stream processing and Hadoop acceleration into one unified and easy to use platform.
Blueprint Series: Banking In The Cloud – Ultra-high Reliability ArchitecturesMatt Stubbs
Data architecture for a challenger bank.Speaker: Jason Maude, Head of Technology Advocacy, Starling BankSpeaker Bio: Jason Maude is a coder, coach, and public speaker. He has over a decade of experience working in the financial sector, primarily in creating and delivering software. He is passionate about explaining complex technical concepts to those who are convinced that they won't be able to understand them. He currently works at Starling Bank as their Head of Technology Advocacy and host of the Starling podcast.Filmed at Skills Matter/Code Node London on 9th May 2019 as part of the Big Data LDN Meetup Blueprint Series.Meetup sponsored by DataStax.
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...Matt Stubbs
Speaker: Cedrick Lunven, Developer Advocate, DataStax
Speaker Bio: Cedrick is a Developer Advocate at DataStax where he finds opportunities to share his passions by speaking about developing distributed architectures and implementing reference applications for developers. In 2013, he created FF4j, an open source framework for Feature Toggle which he still actively maintains. He is now contributor in JHipster team.
Talk Synopsis: We have all introduced more or less functional programming and asynchronous operations into our applications in order to speed up and distribute treatments (e.g., multi-threading, future, completableFuture, etc.). To build truly non-blocking components, optimize resource usage, and avoid "callback hell" you have to think reactive—everything is an event.
From the frontend UI to database communications, it’s now possible to develop Java applications as fully reactive with frameworks like Spring WebFlux and Reactor. With high throughput and tunable consistency, applications built on top of Apache Cassandra™ fit perfectly within this pattern.
DataStax has been developing Apache Cassandra drivers for years, and in the latest version of the enterprise driver we introduced reactive programming.
During this session we will migrate, step by step, a vanilla CRUD Java service (SpringBoot / SpringMVC) into reactive with both code review and live coding. Bring home a working project!
Filmed at Skills Matter/Code Node London on 9th May 2019 as part of the Big Data LDN Meetup Blueprint Series.
Meetup sponsored by DataStax.
More Related Content
Similar to Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW TO FIX THEM!
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
DataOps expands DevOps philosophy to include data-heavy roles (data engineering & data science). DataOps uses better cross-functional collaboration for flexibility, fast time to value and an agile workflow for data-intensive applications including machine learning pipelines. (Strata Data San Jose March 2018)
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...Matt Stubbs
Date: 14th November 2018
Location: Keynote Theatre
Time: 11:50 - 12:20
Speaker: Ellen Friedman
Organisation: MapR
About: We’ve seen that over 90% of our customers have large scale projects successfully in production. What are they doing right? And how can you adapt their effective habits to your own business?
Value comes from big data when you have successful production deployments of data-intensive AI and analytics applications tied to practical business goals. Doing this well can be difficult on many levels. Each business presents its own challenges, but we’ve observed a number of habits that are common to many of the organizations who are getting value from their production deployments.
This presentation will explore 7 key habits that can make a difference and use real world examples to show you why. From architecture to technology to organizational culture, you’ll learn practical approaches that can improve your likelihood of success in production.
Surprising Advantages of Streaming - ACM March 2018Ellen Friedman
Shift to a new idea: stream instead of database as heart of your big data architecture. With the right capabilities for event-by-event streaming data transport (not processing) you get the flexibility of streaming microservices & much more. Includes real world use case examples.
7 Habits for Big Data in Production - keynote Big Data London Nov 2018Ellen Friedman
You can improve your chances for success with data intensive large scale applications (AI, machine learning and analytics) in production.
This keynote presentation from Big Data London shows you how.
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
Join Ellen Friedman, co-author (with Ted Dunning) of a new short O’Reilly book Machine Learning Logistics: Model Management in the Real World, to look at what you can do to have effective model management, including the role of stream-first architecture, containers, a microservices approach and a DataOps style of work. Ellen will provide a basic explanation of a new architecture that not only leverages stream transport but also makes use of canary models and decoy models for accurate model evaluation and for efficient and rapid deployment of new models in production.
Graph Databases and Machine Learning | November 2018TigerGraph
Graph Database and Machine Learning: Finding a Happy Marriage. Graph Databases and Machine Learning
both represent powerful tools for getting more value from data, learn how they can form a harmonious marriage to up-level machine learning.
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
Big data technologies are being applied to a wide variety of use cases. We will review tangible examples of machine learning, discuss an autonomous driving project and illustrate the role of MapR in next generation initiatives. More: http://info.mapr.com/WB_Machine-Learning-for-Chickens_Global_DG_17.11.02_RegistrationPage.html
Designing data pipelines for analytics and machine learning in industrial set...DataWorks Summit
Machine learning has made it possible for technologists to do amazing things with data. Its arrival coincides with the evolution of networked manufacturing systems driven by IoT. In this presentation we’ll examine the rise of IoT and ML from a practitioners perspective to better understand how applications of AI can be built in industrial settings. We'll walk through a case study that combines multiple IoT and ML technologies to monitor and optimize an industrial heating and cooling HVAC system. Through this instructive example you'll see how the following components can be put into action:
1. A StreamSets data pipeline that sources from MQTT and persists to OpenTSDB
2. A TensorFlow model that predicts anomalies in streaming sensor data
3. A Spark application that derives new event streams for real-time alerts
4. A Grafana dashboard that displays factory sensors and alerts in an interactive view
By walking through this solution step-by-step, you'll learn how to build the fundamental capabilities needed in order to handle endless streams of IoT data and derive ML insights from that data:
1. How to transport IoT data through scalable publish/subscribe event streams
2. How to process data streams with transformations and filters
3. How to persist data streams with the timeliness required for interactive dashboards
4. How to collect labeled datasets for training machine learning models
At the end of this presentation you will have learned how a variety of tools can be used together to build ML enhanced applications and data products for instrumented manufacturing systems.
Speakers
Ian Downard, Sr. Developer Evangelist, MapR
William Ochandarena, Senior Director of Product Management, MapR
Cheryl Wiebe - Advanced Analytics in the Industrial WorldRehgan Avon
2018 Women in Analytics Conference
https://www.womeninanalytics.org/
Cheryl will talk about her consulting practice in Industrial Solutions, Analytic solutions for industrial IoT-enabled businesses, including connected factory, connected supply chain, smart mobility, connected assets. Her path to this practice has bounced between hands on systems development, IT strategy, business process reengineering, supply chain analytics, manufacturing quality analytics, and now Industrial IoT analytics. She spent time working in industry as a developer, as a management consultant, started and sold a company, before settling in to pursue this topic as a career analytics consultant. Cheryl will shed light on what's happening in industrial companies struggling to make the transition to digital, what that means, and what barriers they're challenged with. She'll touch on how/where artificial intelligence, deep learning, and machine learning technologies are being used most effectively in industrial companies, and what are the unique challenges they are facing. Reflecting on what's changed over the years, and her journey to witness this, Cheryl will pose what she considers important ideas to consider for women (and men) in pursuing an analytics career successfully and meaningfully.
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
My presentation from AnacondaCON 2018 where I discussed using Recurrent Neural Networks, Python, Tensorflow and the MapR Platform to develop deploy a predictive maintenance model for an IoT device in the manufacturing industry.
The logistics of machine learning typically take waaay more effort than the machine learning itself. Moreover, machine learning systems aren't like normal software projects so continuous integration takes on new meaning.
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive
Think Tank Event 10/23/2017, hosted by The Hive and presented by Ted Dunning, Chief Application Architect of MapR Technologies and Ellen Friedman of MapR Technologies.
MapR is an ideal scalable platform for data science and specifically for operationalizing machine learning in the enterprise. This presentations gives specific reasons why.
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsGreg Makowski
This is a first presentation of Kamanja, a new open-source real-time software product, which integrates with other big-data systems. See also links: http://www.meetup.com/SF-Bay-ACM/events/223615901/ and http://Kamanja.org to download, for docs or community support. For the YouTube video, see https://www.youtube.com/watch?v=g9d87rvcSNk (you may want to start at minute 33).
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
MapR has launched the MapR Data Science Refinery which leverages a scalable data science notebook with native platform access, superior out-of-the-box security, and access to global event streaming and a multi-model NoSQL database.
In memory computing principles by Mac Moore of GridGainData Con LA
In the presentation, we will provide an overview of general in-memory computing principles and the drivers behind it. We will start with a summary of the technical drivers (abundant hardware resources) and market forces (the rise of Big Data). We will cover popular and emerging use cases for in-memory computing, from financial industry trading platforms to mobile payment processing, online advertising, online/mobile gaming back-ends and more. We will then present some foundational concepts and terminology, and discuss considerations around any in-memory solution. From there, we will illustrate how a complete in-memory computing stack like GridGain combines clustering, high performance computing, in-memory data grids, stream processing and Hadoop acceleration into one unified and easy to use platform.
Similar to Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW TO FIX THEM! (20)
Blueprint Series: Banking In The Cloud – Ultra-high Reliability ArchitecturesMatt Stubbs
Data architecture for a challenger bank.Speaker: Jason Maude, Head of Technology Advocacy, Starling BankSpeaker Bio: Jason Maude is a coder, coach, and public speaker. He has over a decade of experience working in the financial sector, primarily in creating and delivering software. He is passionate about explaining complex technical concepts to those who are convinced that they won't be able to understand them. He currently works at Starling Bank as their Head of Technology Advocacy and host of the Starling podcast.Filmed at Skills Matter/Code Node London on 9th May 2019 as part of the Big Data LDN Meetup Blueprint Series.Meetup sponsored by DataStax.
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...Matt Stubbs
Speaker: Cedrick Lunven, Developer Advocate, DataStax
Speaker Bio: Cedrick is a Developer Advocate at DataStax where he finds opportunities to share his passions by speaking about developing distributed architectures and implementing reference applications for developers. In 2013, he created FF4j, an open source framework for Feature Toggle which he still actively maintains. He is now contributor in JHipster team.
Talk Synopsis: We have all introduced more or less functional programming and asynchronous operations into our applications in order to speed up and distribute treatments (e.g., multi-threading, future, completableFuture, etc.). To build truly non-blocking components, optimize resource usage, and avoid "callback hell" you have to think reactive—everything is an event.
From the frontend UI to database communications, it’s now possible to develop Java applications as fully reactive with frameworks like Spring WebFlux and Reactor. With high throughput and tunable consistency, applications built on top of Apache Cassandra™ fit perfectly within this pattern.
DataStax has been developing Apache Cassandra drivers for years, and in the latest version of the enterprise driver we introduced reactive programming.
During this session we will migrate, step by step, a vanilla CRUD Java service (SpringBoot / SpringMVC) into reactive with both code review and live coding. Bring home a working project!
Filmed at Skills Matter/Code Node London on 9th May 2019 as part of the Big Data LDN Meetup Blueprint Series.
Meetup sponsored by DataStax.
Blueprint Series: Expedia Partner Solutions, Data PlatformMatt Stubbs
Join Anselmo for an engaging overview of the new end-to-end data architecture at Expedia Group, taking a journey through cloud and on-prem data lakes, real-time and batch processes and streamlined access for data producers and consumers. Find out how the new architecture unifies a complex mix of data sources and feeds the data science development cycle. Expedia might appear to be a market-leading travel company – in reality, it’s a highly successful technology and data science company.
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...Matt Stubbs
Richard Freeman talks about how the data science team at JustGiving built KOALA, a fully serverless stack for real-time web analytics capture, stream processing, metrics API, and storage service, supporting live data at scale from over 26M users. He discusses recent advances in serverless computing, and how you can implement traditionally container-based microservice patterns using serverless-based architectures instead. Deploying Serverless in your organisation can dramatically increase the delivery speed, productivity and flexibility of the development team, while reducing the overall running, DevOps and maintenance costs.
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCEMatt Stubbs
Date: 14th November 2018
Location: Customer Experience Theatre
Time: 12:30 - 13:00
Speaker: David Maitland
Organisation: Redis Labs
About: This session will cover the technology underpinning at the software infrastructure level required to deliver the instant experience to the end user and enterprises alike. Use cases and value derived by major brands will be shared in this insightful session based the world's most loved database REDIS.
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQLMatt Stubbs
Date: 14th November 2018
Location: Customer Experience Theatre
Time: 11:50 - 12:20
Speaker: Perry Krug
Organisation: Couchbase
About: Who wants to see an ad today for the shoes they bought last week? Everyone knows that customer experience is driven by data: don't waste an opportunity to get them the right data at the right time. Real-time results are critical, but raw speed isn't everything: you need power and flexibility to react to changes on the fly. Come learn how market-leading enterprises are using Couchbase as their speed layer for ingestion, incremental view and presentation layers alongside Kafka, Spark and Hadoop to liberate their data lakes.
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTSMatt Stubbs
Date: 13th November 2018
Location: Customer Experience Theatre
Time: 11:50 - 12:20
Speaker: Charlotte Emms
Organisation: seenit
About: How do you get your colleagues interested in the power of data? Taking you through Seenit’s journey using Couchbase's NoSQL database to create a regular, fully automated update in an easily digestible format.
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Matt Stubbs
Date: 14th November 2018
Location: Governance and MDM Theatre
Time: 10:30 - 11:00
Speaker: Mike Ferguson
Organisation: IBS
About: For most organisations today, data complexity has increased rapidly. In the area of operations, we now have cloud and on-premises OLTP systems with customers, partners and suppliers accessing these applications via APIs and mobile apps. In the area of analytics, we now have data warehouse, data marts, big data Hadoop systems, NoSQL databases, streaming data platforms, cloud storage, cloud data warehouses, and IoT-generated data being created at the edge. Also, the number of data sources is exploding as companies ingest more and more external data such as weather and open government data. Silos have also appeared everywhere as business users are buying in self-service data preparation tools without consideration for how these tools integrate with what IT is using to integrate data. Yet new regulations are demanding that we do a better job of governing data, and business executives are demanding more agility to remain competitive in a digital economy. So how can companies remain agile, reduce cost and reduce the time-to-value when data complexity is on the up?
In this session, Mike will discuss how companies can create an information supply chain to manufacture business-ready data and analytics to reduce time to value and improve agility while also getting data under control.
Date: 13th November 2018
Location: Governance and MDM Theatre
Time: 12:30 - 13:00
Organisation: Immuta
About: Artificial intelligence is rising in importance, but it’s also increasingly at loggerheads with data protection regimes like the GDPR—or so it seems. In this talk, Sophie will explain where and how AI and GDPR conflict with one another, and how to resolve these tensions.
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...Matt Stubbs
Date: 13th November 2018
Location: Governance and MDM Theatre
Time: 11:50 - 12:20
Speaker: Mark Pritchard
Organisation: Denodo
About: Self-service analytics promises to liberate business users to perform analytics without the assistance of IT, and this in turn promises to free IT to focus on enhancing the infrastructure.
Join us to learn how data virtualization will allow you to gain real-time access to enterprise-wide data and deliver self-service analytics. We will explore how you can seamlessly unify fragmented data, replace your high-maintenance and high cost data integrations with a single, low-maintenance data virtualization layer; and how you can preserve your data integrity and ensure data lineage is fully traceable.
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...Matt Stubbs
Date: 13th November 2018
Location: Governance and MDM Theatre
Time: 11:10 - 11:40
Organisation: TIBCO
About: The big data phenomenon continues to accelerate, resulting in multiple data lakes at most organisations. However, according to Gartner, “Through 2019, 90% of the information assets from big data analytic efforts will be siloed and unusable across multiple business processes.”
Are you ready to unleash this data from these silos and deliver the insights your organisation needs to drive compelling customer experiences, innovative new products and optimized operations? In this session you will learn how to apply data virtualisation to: - Access, transform and deliver data from across your lakes, clouds and other data sources - Empower a range of analytic users and tools with all the data they need - Move rapidly to a modern and flexible data architecture for the long run In addition, you will see a demonstration of data virtualisation in action.
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Matt Stubbs
Date: 14th November 2018
Location: Data-Driven Ldn Theatre
Time: 12:30 - 13:00
Organisation: Cloudera
About: The growth of public cloud is reinforcing the need to think more carefully about taking a consistent approach to data governance as technology teams build out a flexible and agile infrastructure to meet the demands of the business.
Join this session to learn more about Cloudera's recommended approach for enterprise-grade security and governance and how to ensure a consistent framework across private, public and on-premises environments.
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICSMatt Stubbs
Date: 14th November 2018
Location: Data-Driven Ldn Theatre
Time: 11:10 - 11:40
Organisation: Microlise
About: Microlise are a leading provider of technology solutions to the transport and logistics industry worldwide. Discover how, with over 400,000 connected assets generating billions of messages a day, Microlise is evolving its platform to bring real-time analytics to its customers to improve safety, security and efficiency outcomes.
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSEMatt Stubbs
Date: 14th November 2018
Location: Data-Driven Ldn Theatre
Time: 10:30 - 11:00
Speaker: Anna Matty
Organisation: Experian
About: Today there is a widespread focus on the 'how' in relation to problem solving. How can we gain better knowledge of what consumers want, or need? How can we be more efficient, reduce the cost to serve, or grow the lifetime value of a customer? But, how do you move to a place where you are not only solving a problem, you are redesigning the entire strategic potential of that problem? You are being armed with insight on what the problem is.
Data and innovation offer huge potential to revolutionise all markets. There is an opportunity to be one step ahead of the need, to redesign journeys and enhance enterprise strategies. To do this you need access to the most advanced analytics but also the best quality, including variations and types of data, and then the technology that can act on this insight. Data science can present a unique opportunity for uncovered growth and accelerate your business through strategic innovation – fast. In this session you will hear more about how today's analytics can move from a single task, to an ongoing strategic opportunity. An opportunity that helps you move at the speed of the market and helps you maximise every opportunity.
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGMatt Stubbs
Date: 13th November 2018
Location: Data-Driven Ldn Theatre
Time: 13:10 - 13:40
Speaker: Brian Goral
Organisation: Cloudera
About: The field of machine learning (ML) ranges from the very practical and pragmatic to the highly theoretical and abstract. This talk describes several of the challenges facing organisations that want to leverage more of their data through ML, including some examples of the applied algorithms that are already delivering value in business contexts.
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Matt Stubbs
Date: 13th November 2018
Location: Data-Driven Ldn Theatre
Time: 12:30 - 13:00
Speaker: Paul Wilkinson, Naveen Gupta
Organisation: Cloudera
About: Investment banks are faced with some of the toughest regulatory requirements in the world. In a market where data is increasing and changing at extraordinary rates the journey with data governance never ends.
In this session, Deutsche Bank will share their journey with big data and explain some of the processes and techniques they have employed to prepare the bank for today’s challenges and tomorrow’s opportunities.
Brought to you by Naveen Gupta, VP Software Engineering, Deutsche Bank and Paul Wilkinson, Principal Solutions Architect, Cloudera.
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...Matt Stubbs
Date: 14th November 2018
Location: Self-Service Analytics Theatre
Time: 13:50 - 14:20
Speaker: Stephanie McReynolds
Organisation: Alation
About: Raw data is proliferating at an enormous rate. But so are our derived data assets - hundreds of dashboards, thousands of reports, millions of transformed data sets. Self-service analytics have ensured that this noise is making it increasingly hard to understand and trust data for decision-making. This trust gap is holding your organisation back from business outcomes.
European analytics leaders have found a way to close the gap between data and decision-making. From MunichRe to Pfizer and Daimler, analytics teams are adopting data catalogues for thousands of self-service analytics users.
Join us in this session to hear how data catalogues that activate data by incorporating machine learning can:
• Increase analyst productivity 20-40%
• Boost the understanding of the nuances of data and
• Establish trust in data-driven decisions with agile stewardship
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATEMatt Stubbs
Date: 13th November 2018
Location: Self-Service Analytics Theatre
Time: 15:50 - 16:20
Speaker: Nishanth Kadiyala
Organisation: Progress
About: The exploding API economy, combined with an advanced analytics market projected to reach $30 billion by 2019, is forcing IT to expose more and more data through APIs. Business analysts, data engineers, and data scientists are still not happy because their needs never really made it into the existing API strategies. This is because most APIs are designed for application integration, but not for the data workers who are looking for APIs that facilitate direct data access to run complex analytics. Data APIs are specifically designed to provide that frictionless data access experience to support analytics across standard interoperable interfaces such as OData (REST) or ODBC/JDBC (SQL). Consider expanding your API strategy to service the developers with open analytics in this $30 billion market.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found