The Netflix data platform: Now and in the future by Kurt BrownData Con LA
Abstract:- The Netflix data platform is constantly evolving, but at it's core, it's an all-cloud platform at a massive scale (60+ PB and over 700 billion new events per day), focused on enabling developers. In this talk, we'll dive into the current (data) technology landscape at Netflix, as well as what's in the works. We'll cover key technologies, such as Spark, Presto, Docker, and Jupyter, along with many broader data ecosystem facets (metadata, insights into jobs run, visualizing big data, etc.). Beyond just tech, we'll also dive a bit into our data platform philosophy. You'll leave with insights into how things work at Netflix, along with some ideas for re-envisioning your data platform.
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Maya Lumbroso
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can Speed up the World"
Bio:
Ronan Corkery is a kdb+ engineer who has been working with Kx and First Derivatives for the past 4 years. Currently based in Total Gas and Power he spent his first 2 year working with Morgan Stanley.
Abstract:
Ronan's presentation will focus on the vertical industries the formally only finance based technologies Kx offers has been moving into. He will present proven solutions as well as introducing the overall architecture that Kx uses as well as laying out potential opportunities to work with Kx.
The Netflix data platform: Now and in the future by Kurt BrownData Con LA
Abstract:- The Netflix data platform is constantly evolving, but at it's core, it's an all-cloud platform at a massive scale (60+ PB and over 700 billion new events per day), focused on enabling developers. In this talk, we'll dive into the current (data) technology landscape at Netflix, as well as what's in the works. We'll cover key technologies, such as Spark, Presto, Docker, and Jupyter, along with many broader data ecosystem facets (metadata, insights into jobs run, visualizing big data, etc.). Beyond just tech, we'll also dive a bit into our data platform philosophy. You'll leave with insights into how things work at Netflix, along with some ideas for re-envisioning your data platform.
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Maya Lumbroso
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can Speed up the World"
Bio:
Ronan Corkery is a kdb+ engineer who has been working with Kx and First Derivatives for the past 4 years. Currently based in Total Gas and Power he spent his first 2 year working with Morgan Stanley.
Abstract:
Ronan's presentation will focus on the vertical industries the formally only finance based technologies Kx offers has been moving into. He will present proven solutions as well as introducing the overall architecture that Kx uses as well as laying out potential opportunities to work with Kx.
Alois Reitbauer - Big Data made it's way into everyday analytics processes. Artificial Intelligence is also becoming more mainstream in automated data analysis and interpretation. The next challenge we have to solve is how to integrate all the data and findings into our natural work environments. We have been working on a system to enabling humans to interact via chat applications like Slack with data and an Artificial Intelligence analysis layer. The talk covers the whole spectrum from designing the system, key considerations in implementing the system and lessons learned from using it in the wild.
Watch this recorded webcast and listen to Infochimps CSO and Co-Founder, Dhruv Bansal, and Think Big Analytics Principal Architect, Douglas Moore, share successful use cases and recommendations for building real-time predictive analytics in your enterprise.
Here's the deck we used for our Series-B round. We raised $150M 6 months after our Series-A and 8 months prior our Seed. It was led by Altimeter and Coatue.
Even though we didn't necessarily show the appendix slides, we sent them along with the rest of the deck.
See https://airbyte.com
Deep Learning Image Processing Applications in the EnterpriseGanesan Narayanasamy
The presentation has many use cases covering the following Image classification: "The process of identifying and detecting an object or a feature in a digital image or video," the report states. In retail, deep learning models "quickly scan and analyze in-store imagery to intuitively determine inventory movement."
Voice recognition: "The ability to receive and interpret dictation or to understand and carry out spoken commands. Models are able to convert captured voice commands to text and then use natural language processing to understand what is being said and in what context." In transportation, deep learning "uses voice commands to enable drivers to make phone calls and adjust internal controls - all without taking their hands off the steering wheel."
Anomaly detection: "Deep learning technique strives to recognize abnormal patterns which don't match the behaviors expected for a particular system, out of millions of different transactions. These applications can lead to the discovery of an attack on financial networks, fraud detection in insurance filings or credit card purchases, even isolating sensor data in industrial facilities signifying a safety issue."
Recommendation engines: "Analyze user actions in order to provide recommendations based on user behavior."
Sentiment analysis: "Leverages deep learning-heavy techniques such as natural language processing, text analysis, and computational linguistics to gain clear insight into customer opinion, understanding of consumer sentiment, and measuring the impact of marketing strategies."
Video analysis: "Process and evaluate vast streams of video footage for a range of tasks including threat detection, which can be used in airport security, banks, and sporting events."
CTO View: Driving the On-Demand Economy with Predictive AnalyticsSingleStore
In the on-demand economy real-time analytics is both a necessity and a competitive advantage. The next evolution in the on-demand economy is in predictive analytics fueled by live streams of data—in effect knowing what customers want before they do. This session will feature technical examples of real-time pipelines, machine learning, and custom dashboards as well as off-the-shelf dashboards with Tableau.
Analysing data analytics use cases to understand big data platformdataeaze systems
Get big picture of data platform architecture by knowing its purpose and problem it solves.
These slides take top down approach, starting with basic purpose of data platform ie. to serve analytics use cases. These slides categorise use cases and analyses their expectation from data platform.
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCEMatt Stubbs
Date: 14th November 2018
Location: Customer Experience Theatre
Time: 12:30 - 13:00
Speaker: David Maitland
Organisation: Redis Labs
About: This session will cover the technology underpinning at the software infrastructure level required to deliver the instant experience to the end user and enterprises alike. Use cases and value derived by major brands will be shared in this insightful session based the world's most loved database REDIS.
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQLMatt Stubbs
Date: 14th November 2018
Location: Customer Experience Theatre
Time: 11:50 - 12:20
Speaker: Perry Krug
Organisation: Couchbase
About: Who wants to see an ad today for the shoes they bought last week? Everyone knows that customer experience is driven by data: don't waste an opportunity to get them the right data at the right time. Real-time results are critical, but raw speed isn't everything: you need power and flexibility to react to changes on the fly. Come learn how market-leading enterprises are using Couchbase as their speed layer for ingestion, incremental view and presentation layers alongside Kafka, Spark and Hadoop to liberate their data lakes.
Bigger, faster, and cloudier: that’s where big data is headed in 2016. More people are doing more things faster with their data, but the details of how continue to evolve. Get up to speed on the latest trends in big data.
An Introduction to the MapR Converged Data PlatformMapR Technologies
Listen to the webinar on-demand: http://info.mapr.com/WB_Partner_CDP_Intro_EMEA_DG_17.05.31_RegistrationPage.html
In this 90-minute webinar, we discuss:
- The MapR Converged Data Platform and its components
- Use cases for the Converged Data Platform
- MapR Converged Partner Program
- How to get started with MapR
- Becoming a partner
How to Quantify the Value of Kafka in Your Organization confluent
(Lyndon Hedderly, Confluent) Kafka Summit SF 2018
We all know real-time data has a value. But how do you quantify that value in order to create a business case for becoming more data, or event driven?
The first half of this talk will explore the value of data across a variety of organizations, starting with the five most valuable companies in the world: Apple, Alphabet (Google), Microsoft, Amazon and Facebook (based on stock prices July 2017). We will go on to discuss other digital natives: Uber, Ebay, Netflix and LinkedIn, before exploring more traditional companies across retail, finance and automotive. Next, we’ll look at non-businesses such as governments and lobbyists. Whether organizations are using data to create new business products and services, improve user experiences, increase productivity, manage risk or influencing global power, we’ll see that fast and interconnected data, or “event streaming” is increasingly important.
After showing that data value can be quantified, the second half of this talk will explain the five steps to creating a business case.
Most businesses focus on:
-Making more money or conferring competitive advantage to make more money
-Increasing efficiency to save money and/or
-Mitigating risk to the business to protect money
-We’ll walk through examples of real business cases, discuss how business cases have evolved over the years and show the power of a sound business case. If you’re interested in big money and big business, as well as big data, this talk is for you.
Big data expert and Infochimps CEO, Jim Kaskade presents the Infinite Monkey Theorem at CloudCon Expo. He provides an energetic, inspiring, and practical perspective on why Big Data is disrupting. It’s more than historic data analyzed on Hadoop. It’s also more than real-time streaming data stored and queried using NoSQL. Learn more at www.Infochimps.com
Alois Reitbauer - Big Data made it's way into everyday analytics processes. Artificial Intelligence is also becoming more mainstream in automated data analysis and interpretation. The next challenge we have to solve is how to integrate all the data and findings into our natural work environments. We have been working on a system to enabling humans to interact via chat applications like Slack with data and an Artificial Intelligence analysis layer. The talk covers the whole spectrum from designing the system, key considerations in implementing the system and lessons learned from using it in the wild.
Watch this recorded webcast and listen to Infochimps CSO and Co-Founder, Dhruv Bansal, and Think Big Analytics Principal Architect, Douglas Moore, share successful use cases and recommendations for building real-time predictive analytics in your enterprise.
Here's the deck we used for our Series-B round. We raised $150M 6 months after our Series-A and 8 months prior our Seed. It was led by Altimeter and Coatue.
Even though we didn't necessarily show the appendix slides, we sent them along with the rest of the deck.
See https://airbyte.com
Deep Learning Image Processing Applications in the EnterpriseGanesan Narayanasamy
The presentation has many use cases covering the following Image classification: "The process of identifying and detecting an object or a feature in a digital image or video," the report states. In retail, deep learning models "quickly scan and analyze in-store imagery to intuitively determine inventory movement."
Voice recognition: "The ability to receive and interpret dictation or to understand and carry out spoken commands. Models are able to convert captured voice commands to text and then use natural language processing to understand what is being said and in what context." In transportation, deep learning "uses voice commands to enable drivers to make phone calls and adjust internal controls - all without taking their hands off the steering wheel."
Anomaly detection: "Deep learning technique strives to recognize abnormal patterns which don't match the behaviors expected for a particular system, out of millions of different transactions. These applications can lead to the discovery of an attack on financial networks, fraud detection in insurance filings or credit card purchases, even isolating sensor data in industrial facilities signifying a safety issue."
Recommendation engines: "Analyze user actions in order to provide recommendations based on user behavior."
Sentiment analysis: "Leverages deep learning-heavy techniques such as natural language processing, text analysis, and computational linguistics to gain clear insight into customer opinion, understanding of consumer sentiment, and measuring the impact of marketing strategies."
Video analysis: "Process and evaluate vast streams of video footage for a range of tasks including threat detection, which can be used in airport security, banks, and sporting events."
CTO View: Driving the On-Demand Economy with Predictive AnalyticsSingleStore
In the on-demand economy real-time analytics is both a necessity and a competitive advantage. The next evolution in the on-demand economy is in predictive analytics fueled by live streams of data—in effect knowing what customers want before they do. This session will feature technical examples of real-time pipelines, machine learning, and custom dashboards as well as off-the-shelf dashboards with Tableau.
Analysing data analytics use cases to understand big data platformdataeaze systems
Get big picture of data platform architecture by knowing its purpose and problem it solves.
These slides take top down approach, starting with basic purpose of data platform ie. to serve analytics use cases. These slides categorise use cases and analyses their expectation from data platform.
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCEMatt Stubbs
Date: 14th November 2018
Location: Customer Experience Theatre
Time: 12:30 - 13:00
Speaker: David Maitland
Organisation: Redis Labs
About: This session will cover the technology underpinning at the software infrastructure level required to deliver the instant experience to the end user and enterprises alike. Use cases and value derived by major brands will be shared in this insightful session based the world's most loved database REDIS.
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQLMatt Stubbs
Date: 14th November 2018
Location: Customer Experience Theatre
Time: 11:50 - 12:20
Speaker: Perry Krug
Organisation: Couchbase
About: Who wants to see an ad today for the shoes they bought last week? Everyone knows that customer experience is driven by data: don't waste an opportunity to get them the right data at the right time. Real-time results are critical, but raw speed isn't everything: you need power and flexibility to react to changes on the fly. Come learn how market-leading enterprises are using Couchbase as their speed layer for ingestion, incremental view and presentation layers alongside Kafka, Spark and Hadoop to liberate their data lakes.
Bigger, faster, and cloudier: that’s where big data is headed in 2016. More people are doing more things faster with their data, but the details of how continue to evolve. Get up to speed on the latest trends in big data.
An Introduction to the MapR Converged Data PlatformMapR Technologies
Listen to the webinar on-demand: http://info.mapr.com/WB_Partner_CDP_Intro_EMEA_DG_17.05.31_RegistrationPage.html
In this 90-minute webinar, we discuss:
- The MapR Converged Data Platform and its components
- Use cases for the Converged Data Platform
- MapR Converged Partner Program
- How to get started with MapR
- Becoming a partner
How to Quantify the Value of Kafka in Your Organization confluent
(Lyndon Hedderly, Confluent) Kafka Summit SF 2018
We all know real-time data has a value. But how do you quantify that value in order to create a business case for becoming more data, or event driven?
The first half of this talk will explore the value of data across a variety of organizations, starting with the five most valuable companies in the world: Apple, Alphabet (Google), Microsoft, Amazon and Facebook (based on stock prices July 2017). We will go on to discuss other digital natives: Uber, Ebay, Netflix and LinkedIn, before exploring more traditional companies across retail, finance and automotive. Next, we’ll look at non-businesses such as governments and lobbyists. Whether organizations are using data to create new business products and services, improve user experiences, increase productivity, manage risk or influencing global power, we’ll see that fast and interconnected data, or “event streaming” is increasingly important.
After showing that data value can be quantified, the second half of this talk will explain the five steps to creating a business case.
Most businesses focus on:
-Making more money or conferring competitive advantage to make more money
-Increasing efficiency to save money and/or
-Mitigating risk to the business to protect money
-We’ll walk through examples of real business cases, discuss how business cases have evolved over the years and show the power of a sound business case. If you’re interested in big money and big business, as well as big data, this talk is for you.
Big data expert and Infochimps CEO, Jim Kaskade presents the Infinite Monkey Theorem at CloudCon Expo. He provides an energetic, inspiring, and practical perspective on why Big Data is disrupting. It’s more than historic data analyzed on Hadoop. It’s also more than real-time streaming data stored and queried using NoSQL. Learn more at www.Infochimps.com
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesKai Wähner
This session introduces Apache Kafka, an event-driven open source streaming platform. Apache Kafka goes far beyond scalable, high volume messaging. In addition, you can leverage Kafka Connect for integration and the Kafka Streams API for building lightweight stream processing microservices in autonomous teams. The Confluent Platform adds further components such as a Schema Registry, REST Proxy, KSQL, Clients for different programming languages and Connectors for different technologies.
The session discusses how tech giants like LinkedIn, Ebay or Airbnb leverage Apache Kafka as event streaming platform to solve various different business problems and how to create a scalable, flexible microservice architecture. A live demo shows how you can easily process and analyze streams of events using Apache Kafka and KSQL.
Neha Narkhede talks about the experience at LinkedIn moving from batch-oriented ETL to real-time streams using Apache Kafka and how the design and implementation of Kafka was driven by this goal of acting as a real-time platform for event data. She covers some of the challenges of scaling Kafka to hundreds of billions of events per day at Linkedin, supporting thousands of engineers, etc.
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertconfluent
Für die Automobilindustrie ist die digitale Transformation wie für jede andere Branche zugleich eine digitale Revolution: Neue Marktspieler, neue Technologien und die in immer größeren Mengen anfallenden Daten schaffen neue Chancen, aber auch neue Herausforderungen – und erfordern neben neuen IT-Architekturen auch völlig neue Denkansätze.
60% der Fortune500-Unternehmen setzen zur Umsetzung ihrer Daten-Streaming-Projekte auf die umfassende verteilte Streaming-Plattform Apache Kafka®, darunter auch die AUDI AG.
Erfahren Sie in diesem Webinar:
Wie Kafka als Grundlage sowohl für Daten-Pipelines als auch für Anwendungen dient, die Echtzeit-Datenströme konsumieren und verarbeiten.
Wie Kafka Connect und Kafka Streams geschäftskritische Anwendungen unterstützt
Wie Audi mithilfe von Kafka und Confluent eine Fast Data IoT-Plattform umgesetzt hat, die den Bereich „Connected Car“ revolutioniert
Sprecher:
David Schmitz, Principal Architect, Audi Electronics Venture GmbH
Kai Waehner, Technology Evangelist, Confluent
Event: https://www.meetup.com/de-DE/Vienna-Kafka-meetup/events/262314643/
Speaker: Patrik Kleindl (patrik.kleindl@bearingpoint.com)
Slides of the introduction to Apache Kafka and some popular use cases.
Slides were provided by Confluent (confluent.io)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Kai Wähner
Learn the differences between an event-driven streaming platform and middleware like MQ, ETL and ESBs – including best practices and anti-patterns, but also how these concepts and tools complement each other in an enterprise architecture.
Extract-Transform-Load (ETL) is still a widely-used pattern to move data between different systems via batch processing. Due to its challenges in today’s world where real time is the new standard, an Enterprise Service Bus (ESB) is used in many enterprises as integration backbone between any kind of microservice, legacy application or cloud service to move data via SOAP / REST Web Services or other technologies. Stream Processing is often added as its own component in the enterprise architecture for correlation of different events to implement contextual rules and stateful analytics. Using all these components introduces challenges and complexities in development and operations.
This session discusses how teams in different industries solve these challenges by building a native streaming platform from the ground up instead of using ETL and ESB tools in their architecture. This allows to build and deploy independent, mission-critical streaming real time application and microservices. The architecture leverages distributed processing and fault-tolerance with fast failover, no-downtime rolling deployments and the ability to reprocess events, so you can recalculate output when your code changes. Integration and Stream Processing are still key functionality but can be realized in real time natively instead of using additional ETL, ESB or Stream Processing tools.
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhedeconfluent
Slides from Neha Narkhede's keynote at the dotScale conference in Paris on April 24th, 2017.
There is a tectonic shift happening in how data powers the core of a company's business. This shift is about the rise of real-time. Apache Kafka was built with the vision to help companies navigate this change and become the central nervous system that makes data available in real-time to all the applications that need to use it.
This talk is about how you can put Apache Kafka to practice to help your company make this shift to real-time. And how the Connect and Streams API in Apache Kafka capture the entire scope of what it means to put streams into practice.
Streaming Data Ingest and Processing with Apache KafkaAttunity
Apache™ Kafka is a fast, scalable, durable, and fault-tolerant
publish-subscribe messaging system. It offers higher throughput, reliability and replication. To manage growing data volumes, many companies are leveraging Kafka for streaming data ingest and processing.
Join experts from Confluent, the creators of Apache™ Kafka, and the experts at Attunity, a leader in data integration software, for a live webinar where you will learn how to:
-Realize the value of streaming data ingest with Kafka
-Turn databases into live feeds for streaming ingest and processing
-Accelerate data delivery to enable real-time analytics
-Reduce skill and training requirements for data ingest
The recorded webinar on slide 32 includes a demo using automation software (Attunity Replicate) to stream live changes from a database into Kafka and also includes a Q&A with our experts.
For more information, please go to www.attunity.com/kafka.
View the recording:
http://hortonworks.com/webinar/accelerating-real-time-data-ingest-hadoop/
Hadoop didn’t disrupt the data center. The exploding amounts of data did. But, let’s face it, if you can’t move your data to Hadoop, then you can’t use it in Hadoop. The experts from Hortonworks, the #1 leader in Hadoop development, and Attunity, a leading data management software provider, cover:
- How to ingest your most valuable data into Hadoop using Attunity Replicate
- About how customers are using Hortonworks DataFlow (HDF) powered by Apache NiFi
- How to combine the real-time change data capture (CDC) technology with connected data platforms from Hortonworks
We discuss how Attunity Replicate and Hortonworks Data Flow (HDF) work together to move data into Hadoop.
Data Streaming with Apache Kafka & MongoDB - EMEAAndrew Morgan
A new generation of technologies is needed to consume and exploit today's real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.
This webinar explores the use-cases and architecture for Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
A new generation of technologies is needed to consume and exploit today's real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.
Lightbend Fast Data Platform - A Technical Overview
Dean Wampler, O’Reilly author and Big Data Strategist in the office of the CTO at Lightbend discusses practical tips for architecting stream-processing applications and explains how you can tame some of the complexity in moving from data at rest to data in motion.
Un'introduzione ad Apache Kafka e Kafka Connect APIs (part of Apache Kafka), in particolare come Kafka possa essere usato assieme ad Elasticsearch.
Grazie a Seacom per averci invitato all'evento a Roma.
In this slidecast, Jim Kaskade from Infochimps presents: Cloud for Big Data.
"Infochimps was founded by data scientists and cloud computing experts. Our solutions make it faster, easier and far less complex to build and manage Big Data systems behind applications to quickly deliver actionable insights. With Infochimps Cloud, enterprises benefit from the fastest way to deploy Big Data applications in complex, hybrid cloud environments."
Learn more at:
http://infochimps.com
View the presentation video:
http://inside-bigdata.com/slidecast-cloud-for-big-data/
Data Streaming with Apache Kafka & MongoDBconfluent
Explore the use-cases and architecture for Apache Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Blueprint Series: Banking In The Cloud – Ultra-high Reliability ArchitecturesMatt Stubbs
Data architecture for a challenger bank.Speaker: Jason Maude, Head of Technology Advocacy, Starling BankSpeaker Bio: Jason Maude is a coder, coach, and public speaker. He has over a decade of experience working in the financial sector, primarily in creating and delivering software. He is passionate about explaining complex technical concepts to those who are convinced that they won't be able to understand them. He currently works at Starling Bank as their Head of Technology Advocacy and host of the Starling podcast.Filmed at Skills Matter/Code Node London on 9th May 2019 as part of the Big Data LDN Meetup Blueprint Series.Meetup sponsored by DataStax.
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...Matt Stubbs
Speaker: Cedrick Lunven, Developer Advocate, DataStax
Speaker Bio: Cedrick is a Developer Advocate at DataStax where he finds opportunities to share his passions by speaking about developing distributed architectures and implementing reference applications for developers. In 2013, he created FF4j, an open source framework for Feature Toggle which he still actively maintains. He is now contributor in JHipster team.
Talk Synopsis: We have all introduced more or less functional programming and asynchronous operations into our applications in order to speed up and distribute treatments (e.g., multi-threading, future, completableFuture, etc.). To build truly non-blocking components, optimize resource usage, and avoid "callback hell" you have to think reactive—everything is an event.
From the frontend UI to database communications, it’s now possible to develop Java applications as fully reactive with frameworks like Spring WebFlux and Reactor. With high throughput and tunable consistency, applications built on top of Apache Cassandra™ fit perfectly within this pattern.
DataStax has been developing Apache Cassandra drivers for years, and in the latest version of the enterprise driver we introduced reactive programming.
During this session we will migrate, step by step, a vanilla CRUD Java service (SpringBoot / SpringMVC) into reactive with both code review and live coding. Bring home a working project!
Filmed at Skills Matter/Code Node London on 9th May 2019 as part of the Big Data LDN Meetup Blueprint Series.
Meetup sponsored by DataStax.
Blueprint Series: Expedia Partner Solutions, Data PlatformMatt Stubbs
Join Anselmo for an engaging overview of the new end-to-end data architecture at Expedia Group, taking a journey through cloud and on-prem data lakes, real-time and batch processes and streamlined access for data producers and consumers. Find out how the new architecture unifies a complex mix of data sources and feeds the data science development cycle. Expedia might appear to be a market-leading travel company – in reality, it’s a highly successful technology and data science company.
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...Matt Stubbs
Richard Freeman talks about how the data science team at JustGiving built KOALA, a fully serverless stack for real-time web analytics capture, stream processing, metrics API, and storage service, supporting live data at scale from over 26M users. He discusses recent advances in serverless computing, and how you can implement traditionally container-based microservice patterns using serverless-based architectures instead. Deploying Serverless in your organisation can dramatically increase the delivery speed, productivity and flexibility of the development team, while reducing the overall running, DevOps and maintenance costs.
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTSMatt Stubbs
Date: 13th November 2018
Location: Customer Experience Theatre
Time: 11:50 - 12:20
Speaker: Charlotte Emms
Organisation: seenit
About: How do you get your colleagues interested in the power of data? Taking you through Seenit’s journey using Couchbase's NoSQL database to create a regular, fully automated update in an easily digestible format.
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Matt Stubbs
Date: 14th November 2018
Location: Governance and MDM Theatre
Time: 10:30 - 11:00
Speaker: Mike Ferguson
Organisation: IBS
About: For most organisations today, data complexity has increased rapidly. In the area of operations, we now have cloud and on-premises OLTP systems with customers, partners and suppliers accessing these applications via APIs and mobile apps. In the area of analytics, we now have data warehouse, data marts, big data Hadoop systems, NoSQL databases, streaming data platforms, cloud storage, cloud data warehouses, and IoT-generated data being created at the edge. Also, the number of data sources is exploding as companies ingest more and more external data such as weather and open government data. Silos have also appeared everywhere as business users are buying in self-service data preparation tools without consideration for how these tools integrate with what IT is using to integrate data. Yet new regulations are demanding that we do a better job of governing data, and business executives are demanding more agility to remain competitive in a digital economy. So how can companies remain agile, reduce cost and reduce the time-to-value when data complexity is on the up?
In this session, Mike will discuss how companies can create an information supply chain to manufacture business-ready data and analytics to reduce time to value and improve agility while also getting data under control.
Date: 13th November 2018
Location: Governance and MDM Theatre
Time: 12:30 - 13:00
Organisation: Immuta
About: Artificial intelligence is rising in importance, but it’s also increasingly at loggerheads with data protection regimes like the GDPR—or so it seems. In this talk, Sophie will explain where and how AI and GDPR conflict with one another, and how to resolve these tensions.
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...Matt Stubbs
Date: 13th November 2018
Location: Governance and MDM Theatre
Time: 11:50 - 12:20
Speaker: Mark Pritchard
Organisation: Denodo
About: Self-service analytics promises to liberate business users to perform analytics without the assistance of IT, and this in turn promises to free IT to focus on enhancing the infrastructure.
Join us to learn how data virtualization will allow you to gain real-time access to enterprise-wide data and deliver self-service analytics. We will explore how you can seamlessly unify fragmented data, replace your high-maintenance and high cost data integrations with a single, low-maintenance data virtualization layer; and how you can preserve your data integrity and ensure data lineage is fully traceable.
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...Matt Stubbs
Date: 13th November 2018
Location: Governance and MDM Theatre
Time: 11:10 - 11:40
Organisation: TIBCO
About: The big data phenomenon continues to accelerate, resulting in multiple data lakes at most organisations. However, according to Gartner, “Through 2019, 90% of the information assets from big data analytic efforts will be siloed and unusable across multiple business processes.”
Are you ready to unleash this data from these silos and deliver the insights your organisation needs to drive compelling customer experiences, innovative new products and optimized operations? In this session you will learn how to apply data virtualisation to: - Access, transform and deliver data from across your lakes, clouds and other data sources - Empower a range of analytic users and tools with all the data they need - Move rapidly to a modern and flexible data architecture for the long run In addition, you will see a demonstration of data virtualisation in action.
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Matt Stubbs
Date: 14th November 2018
Location: Data-Driven Ldn Theatre
Time: 12:30 - 13:00
Organisation: Cloudera
About: The growth of public cloud is reinforcing the need to think more carefully about taking a consistent approach to data governance as technology teams build out a flexible and agile infrastructure to meet the demands of the business.
Join this session to learn more about Cloudera's recommended approach for enterprise-grade security and governance and how to ensure a consistent framework across private, public and on-premises environments.
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICSMatt Stubbs
Date: 14th November 2018
Location: Data-Driven Ldn Theatre
Time: 11:10 - 11:40
Organisation: Microlise
About: Microlise are a leading provider of technology solutions to the transport and logistics industry worldwide. Discover how, with over 400,000 connected assets generating billions of messages a day, Microlise is evolving its platform to bring real-time analytics to its customers to improve safety, security and efficiency outcomes.
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSEMatt Stubbs
Date: 14th November 2018
Location: Data-Driven Ldn Theatre
Time: 10:30 - 11:00
Speaker: Anna Matty
Organisation: Experian
About: Today there is a widespread focus on the 'how' in relation to problem solving. How can we gain better knowledge of what consumers want, or need? How can we be more efficient, reduce the cost to serve, or grow the lifetime value of a customer? But, how do you move to a place where you are not only solving a problem, you are redesigning the entire strategic potential of that problem? You are being armed with insight on what the problem is.
Data and innovation offer huge potential to revolutionise all markets. There is an opportunity to be one step ahead of the need, to redesign journeys and enhance enterprise strategies. To do this you need access to the most advanced analytics but also the best quality, including variations and types of data, and then the technology that can act on this insight. Data science can present a unique opportunity for uncovered growth and accelerate your business through strategic innovation – fast. In this session you will hear more about how today's analytics can move from a single task, to an ongoing strategic opportunity. An opportunity that helps you move at the speed of the market and helps you maximise every opportunity.
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGMatt Stubbs
Date: 13th November 2018
Location: Data-Driven Ldn Theatre
Time: 13:10 - 13:40
Speaker: Brian Goral
Organisation: Cloudera
About: The field of machine learning (ML) ranges from the very practical and pragmatic to the highly theoretical and abstract. This talk describes several of the challenges facing organisations that want to leverage more of their data through ML, including some examples of the applied algorithms that are already delivering value in business contexts.
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Matt Stubbs
Date: 13th November 2018
Location: Data-Driven Ldn Theatre
Time: 12:30 - 13:00
Speaker: Paul Wilkinson, Naveen Gupta
Organisation: Cloudera
About: Investment banks are faced with some of the toughest regulatory requirements in the world. In a market where data is increasing and changing at extraordinary rates the journey with data governance never ends.
In this session, Deutsche Bank will share their journey with big data and explain some of the processes and techniques they have employed to prepare the bank for today’s challenges and tomorrow’s opportunities.
Brought to you by Naveen Gupta, VP Software Engineering, Deutsche Bank and Paul Wilkinson, Principal Solutions Architect, Cloudera.
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...Matt Stubbs
Date: 14th November 2018
Location: Self-Service Analytics Theatre
Time: 13:50 - 14:20
Speaker: Stephanie McReynolds
Organisation: Alation
About: Raw data is proliferating at an enormous rate. But so are our derived data assets - hundreds of dashboards, thousands of reports, millions of transformed data sets. Self-service analytics have ensured that this noise is making it increasingly hard to understand and trust data for decision-making. This trust gap is holding your organisation back from business outcomes.
European analytics leaders have found a way to close the gap between data and decision-making. From MunichRe to Pfizer and Daimler, analytics teams are adopting data catalogues for thousands of self-service analytics users.
Join us in this session to hear how data catalogues that activate data by incorporating machine learning can:
• Increase analyst productivity 20-40%
• Boost the understanding of the nuances of data and
• Establish trust in data-driven decisions with agile stewardship
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATEMatt Stubbs
Date: 13th November 2018
Location: Self-Service Analytics Theatre
Time: 15:50 - 16:20
Speaker: Nishanth Kadiyala
Organisation: Progress
About: The exploding API economy, combined with an advanced analytics market projected to reach $30 billion by 2019, is forcing IT to expose more and more data through APIs. Business analysts, data engineers, and data scientists are still not happy because their needs never really made it into the existing API strategies. This is because most APIs are designed for application integration, but not for the data workers who are looking for APIs that facilitate direct data access to run complex analytics. Data APIs are specifically designed to provide that frictionless data access experience to support analytics across standard interoperable interfaces such as OData (REST) or ODBC/JDBC (SQL). Consider expanding your API strategy to service the developers with open analytics in this $30 billion market.
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESMatt Stubbs
Date: 13th November 2018
Location: Self-Service Analytics Theatre
Time: 14:30 - 15:00
Speaker: Zaf Khan
Organisation: Arcadia Data
About: The use of data lakes continue to grow, and a recent survey by Eckerson Group shows that organizations are getting real value from their deployments. However, there’s still a lot of room for improvement when it comes to giving business users access to the wealth of potential insights in the data lake.
While the data management aspect has been fairly well understood over the years, the success of business intelligence (BI) and analytics on data lakes lags behind. In fact, organizations often struggle with data lakes because they are only accessible by highly-skilled data scientists and not by business users. But BI tools have been able to access data warehouses for years, so what gives?
In this talk, we’ll discuss:
• Why traditional BI tools are architected well for data warehouses, but not data lakes.
• Why every organization should have two BI standards: one for data warehouses and one for data lakes.
• Innovative capabilities provided by BI for data lakes
Big Data LDN 2018: FIGHTING DATA CHAOS: CONNECTING USERS TO DATA AT SCALEMatt Stubbs
Date: 13th November 2018
Location: Self-Service Analytics
Theatre Time: 12:30 - 13:00
Speaker: Joel McKelvey
Organisation: Looker
About: Companies that use data well are more efficient, effective, and profitable. Unfortunately, most organizations struggle to keep up with the changing supply of data — and the growing business demands for that data. The key is to connect data supply to data users in a way that scales, supports existing workflows, and serves as a foundation for the future.
This session will explore how to bring data to users where and when they need it without sacrificing data governance or unified metrics. This session will also present proven ways to build a data foundation for your organisation that can support future changes in both data supply and data demand.
Specifically, attendees will discover:
• The key considerations to driving the most value from data, including: self-service, governance, custom interfaces, modeling, and connections to existing business systems.
• How to provide users access to data in a way that naturally fits in their existing workflows and allows users to take immediate action.
• How companies like Deliveroo and King extract critical business insights from growing data and deliver those insights to their business users.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found