How we use Hive at SnowPlow, and how the role of HIve is changingyalisassoon
The document summarizes how SnowPlow uses Apache Hive and other big data technologies to perform web analytics. It discusses how Hive is used at SnowPlow, the strengths and weaknesses of Hive versus alternatives, and how SnowPlow is leveraging technologies like Scalding, Infobright, and Mahout for more robust ETL, faster queries, and machine learning capabilities beyond SQL.
On the importance of evolving your data pipeline with your business, and how Snowplow enables that through self-describing data and the ability to recompute your data models on the entire event data set.
This document discusses event data and the Snowplow data pipeline. It notes that 3 years ago, analyzing user behavior and engagement using tools like Google Analytics was difficult. The Snowplow data pipeline was created to collect and analyze event-level data at scale using open source big data technologies. The pipeline has expanded to encompass different types of digital event data by developing a schema for structured JSON events and a versioning system. A real-time version of the pipeline is also being built to feed event data into applications in addition to batch processing. Developing a semantic model and standard framework for describing events is discussed as being important for enabling downstream applications to consume structured event data.
2016 09 measurecamp - event data modelingyalisassoon
Presentation by Christophe Bogaert to Measurecamp London September 2016. Christophe discussed what makes consuming and analysing event-streams difficult, and outlined a number of techniques for overcoming those obstacles.
Why use big data tools to do web analytics? And how to do it using Snowplow a...yalisassoon
There are a number of mature web analytics products that have been on the market for ~20 years. Big data tools have only really taken off in the last 5 years. So why use big data tools mine web analytics data?
In this presentation, I explore the limitations of traditional approaches to web analytics, and explain how big data tools can be used to address those limitations and drive more value from the underlying data. I explain how a combination of Snowplow and Qubole can be used to do this in practice
Snowplow - Evolve your analytics stack with your businessGiuseppe Gaviani
The document discusses how analytics stacks need to evolve with businesses over time as products, questions, and data change. It describes how Snowplow users can define their own event and entity schemas to model their data. Self-describing data validated against these schemas allows adding new sources, updating existing events/entities, and recomputing data models on the full dataset as needed. This enables the analytics pipeline to evolve flexibly in response to new business questions or tracked information.
Snowplow at DA Hub emerging technology showcaseyalisassoon
This document discusses Snowplow Analytics and its approach to reinventing digital analytics. Snowplow allows users to define their own event types, track events across all channels, and answer any questions by storing all digital event data in their own data warehouse. This enables users to join their data to other data sets, pick their own processing logic, and plug in any analytics tools. Snowplow is also open source, free, allows users to own their data and intelligence, and is scalable for tracking large volumes of events.
The analytics journey at Viewbix - how they came to use Snowplow and the setu...yalisassoon
This document summarizes the evolution of video measurement and analytics solutions used by a company. It describes several solutions the company implemented, including sending tracking events to a server hosted on Rackspace [1], distributing event collection to Akamai and processing with Hadoop/Hive/SQL in Azure [2], and ultimately implementing a solution using Snowplow that addressed all of their requirements [3]. Key benefits of Snowplow included no limits on data, flexible data modeling, fast reporting and owning their own data. The document ends by discussing lessons learned around data quality, infrastructure costs, modeling needs and focusing on small, actionable insights from big data.
How we use Hive at SnowPlow, and how the role of HIve is changingyalisassoon
The document summarizes how SnowPlow uses Apache Hive and other big data technologies to perform web analytics. It discusses how Hive is used at SnowPlow, the strengths and weaknesses of Hive versus alternatives, and how SnowPlow is leveraging technologies like Scalding, Infobright, and Mahout for more robust ETL, faster queries, and machine learning capabilities beyond SQL.
On the importance of evolving your data pipeline with your business, and how Snowplow enables that through self-describing data and the ability to recompute your data models on the entire event data set.
This document discusses event data and the Snowplow data pipeline. It notes that 3 years ago, analyzing user behavior and engagement using tools like Google Analytics was difficult. The Snowplow data pipeline was created to collect and analyze event-level data at scale using open source big data technologies. The pipeline has expanded to encompass different types of digital event data by developing a schema for structured JSON events and a versioning system. A real-time version of the pipeline is also being built to feed event data into applications in addition to batch processing. Developing a semantic model and standard framework for describing events is discussed as being important for enabling downstream applications to consume structured event data.
2016 09 measurecamp - event data modelingyalisassoon
Presentation by Christophe Bogaert to Measurecamp London September 2016. Christophe discussed what makes consuming and analysing event-streams difficult, and outlined a number of techniques for overcoming those obstacles.
Why use big data tools to do web analytics? And how to do it using Snowplow a...yalisassoon
There are a number of mature web analytics products that have been on the market for ~20 years. Big data tools have only really taken off in the last 5 years. So why use big data tools mine web analytics data?
In this presentation, I explore the limitations of traditional approaches to web analytics, and explain how big data tools can be used to address those limitations and drive more value from the underlying data. I explain how a combination of Snowplow and Qubole can be used to do this in practice
Snowplow - Evolve your analytics stack with your businessGiuseppe Gaviani
The document discusses how analytics stacks need to evolve with businesses over time as products, questions, and data change. It describes how Snowplow users can define their own event and entity schemas to model their data. Self-describing data validated against these schemas allows adding new sources, updating existing events/entities, and recomputing data models on the full dataset as needed. This enables the analytics pipeline to evolve flexibly in response to new business questions or tracked information.
Snowplow at DA Hub emerging technology showcaseyalisassoon
This document discusses Snowplow Analytics and its approach to reinventing digital analytics. Snowplow allows users to define their own event types, track events across all channels, and answer any questions by storing all digital event data in their own data warehouse. This enables users to join their data to other data sets, pick their own processing logic, and plug in any analytics tools. Snowplow is also open source, free, allows users to own their data and intelligence, and is scalable for tracking large volumes of events.
The analytics journey at Viewbix - how they came to use Snowplow and the setu...yalisassoon
This document summarizes the evolution of video measurement and analytics solutions used by a company. It describes several solutions the company implemented, including sending tracking events to a server hosted on Rackspace [1], distributing event collection to Akamai and processing with Hadoop/Hive/SQL in Azure [2], and ultimately implementing a solution using Snowplow that addressed all of their requirements [3]. Key benefits of Snowplow included no limits on data, flexible data modeling, fast reporting and owning their own data. The document ends by discussing lessons learned around data quality, infrastructure costs, modeling needs and focusing on small, actionable insights from big data.
Snowplow is at the core of everything we doyalisassoon
This document discusses Bauer Media's use of Snowplow for data collection and analytics across their 116 websites in Australia and New Zealand. Some key points:
- Bauer Media started collecting Snowplow data in 2014 without a specific use case in mind.
- They now use Snowplow data for cross-site reporting, ad hoc analysis, checking audience reports, and stalking individual users.
- Snowplow allows them to track things like page views, user behavior, content metadata, and ads that can't be tracked as well with Google Analytics.
Yali presentation for snowplow amsterdam meetup number 2yalisassoon
Digital analytics is a very exciting place to work because digital event data is becoming more interesting as more of our digital lives are intermediated by digital platforms.
In this presentation I explain how at Snowplow we're working to make it easier to build insight and act on digital event.
This document discusses different solutions for collecting, aggregating, and analyzing tracking event data from websites. It describes problems with an initial solution that collected data on a local server, including network bottlenecks and lack of scalability. Two alternative solutions are proposed: 1) distributing data collection to Akamai and aggregating to Azure, but this had issues with reporting speed and flexibility, and 2) using a third-party solution. The third-party solution, Snowplow, is ultimately selected as it meets requirements for handling different tracker types, unlimited data collection, fast reporting, data ownership, and flexibility. Challenges encountered include data errors, lack of auto-scaling, database query optimization, and over-reliance on visualization
How to evolve your analytics stack with your business using SnowplowGiuseppe Gaviani
This document discusses evolving an analytics stack to match business changes. It recommends defining self-describing event and entity schemas that can be updated over time. Event data modeling aggregates raw events into modeled data like users and sessions for easier analysis. To evolve the data pipeline, businesses should use self-describing data that allows recomputing models on historical data when new questions arise or data collection changes.
Using Snowplow for A/B testing and user journey analysis at CustomMadeyalisassoon
This document discusses user journeys, analyzing how users interact with a website over time to understand conversion rates. It describes building a unified data model to visualize customer journeys and see that most visitors to listing pages leave the site. The document also discusses how more page views are linked to more conversions and how A/B testing can be done using event tracking to test different page variants and their impact on user behavior and conversions.
Big data meetup budapest adding data schemas to snowplowyalisassoon
The document discusses adding data schemas to Snowplow, an open-source web and event analytics platform. It describes how Snowplow is evolving from a web analytics platform to a general event analytics platform to handle an infinite number of possible event types from any connected device. To address this, the document proposes adding JSON schemas to define the structure of each event type. These schemas would be versioned and stored in a central schema repository/registry to define the structure of raw and enriched events processed by Snowplow.
Snowplow: open source game analytics powered by AWSGiuseppe Gaviani
This is a presentation by Alex Dean and Yali Sassoon at Snowplow about open source game analytics powered by AWS. It was presented at the Games Developer Conference (GDC) in San Francisco, February 2017
Our cofounder Alex Dean gave an introduction to Snowplow and then talked about our roadmap for 2017. Alex touched on several topics including support for more clouds, support for more storage targets, tailoring Snowplow to your industry, more intelligent event sources, moving our batch pipeline to Spark, mega-scale Snowplow and real-time support for Sauna, our decisioning and response system. Presented on 5 April 2017.
Implementing improved and consistent arbitrary event tracking company-wide us...yalisassoon
Talk on the role Snowplow plays as part of the larger project to make data accessible to product marketing and other data-driven teams at StumbleUpon. Touches on technical and organizational challenges
Snowplow: evolve your analytics stack with your businessyalisassoon
Deep dive into how digital analytics stacks need to evolve with businesses, and how self-describing data and event data modeling are the key elements that enable Snowplow data pipeliens to elegantly evolve over time
Data driven video advertising campaigns - JustWatch & SnowplowGiuseppe Gaviani
This document summarizes JustWatch's journey in building a universal customer relationship management (CRM) system focused on the movie industry. It discusses how JustWatch started using the Snowplow open-source data pipeline to collect user data from their movie apps. JustWatch then automated building targeted audiences and managing user identities across channels to power personalized, data-driven movie marketing campaigns on platforms like YouTube and Facebook. The document outlines the key components developed, including an Audience Builder, ID Service, and interactions module. It also shares learnings around questioning real-time needs and the value of separating operational and analytics data stores.
Metail uses Snowplow to collect customer data and Cascalog to process that data into normalized batch views for analysis. Cascalog transforms the raw Snowplow event stream into structured tables for things like customer body shape, orders, items ordered, returns, and browsers. This makes the data more manageable and complex analysis and aggregation easier. For example, Cascalog is used to calculate key performance indicators by grouping customer data and summing metrics from the batch views. The output is then analyzed further in R. Looker will also allow business analysts to access and explore the batch views and raw Snowplow event data.
TripleLift: Preparing for a New Programmatic Ad-Tech WorldVoltDB
Michael Harroun, Director of Backend Architecture at TripleLift explores the benefits of leveraging real-time databases to power their programmatic native advertisement exchange.
Presentation given by Christophe Bogaert at the inaugural Snowplow Meetup New York in March 2016. Christophe described the event data modeling process at a high level before diving into specific tools and techniques for developing performant models.
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015yalisassoon
The document discusses ChefsFeed, a marketing platform for chefs and restaurants. It outlines some of the key challenges with traditional marketing data systems, which focus only on direct conversions like purchases, installations, or registrations. However, this ignores a lot of important user behavior data that occurs between marketing exposure and conversions. The document then introduces Snowplow as a solution that can track all user behavior events consistently across systems, providing a complete picture of the user journey from first exposure through conversions. This unified event data is valuable for understanding user experiences and improving marketing strategies.
Snowplow is at the core of everything we doyalisassoon
This document discusses Bauer Media's use of Snowplow for data collection and analytics across their 116 websites in Australia and New Zealand. Some key points:
- Bauer Media started collecting Snowplow data in 2014 without a specific use case in mind.
- They now use Snowplow data for cross-site reporting, ad hoc analysis, checking audience reports, and stalking individual users.
- Snowplow allows them to track things like page views, user behavior, content metadata, and ads that can't be tracked as well with Google Analytics.
Yali presentation for snowplow amsterdam meetup number 2yalisassoon
Digital analytics is a very exciting place to work because digital event data is becoming more interesting as more of our digital lives are intermediated by digital platforms.
In this presentation I explain how at Snowplow we're working to make it easier to build insight and act on digital event.
This document discusses different solutions for collecting, aggregating, and analyzing tracking event data from websites. It describes problems with an initial solution that collected data on a local server, including network bottlenecks and lack of scalability. Two alternative solutions are proposed: 1) distributing data collection to Akamai and aggregating to Azure, but this had issues with reporting speed and flexibility, and 2) using a third-party solution. The third-party solution, Snowplow, is ultimately selected as it meets requirements for handling different tracker types, unlimited data collection, fast reporting, data ownership, and flexibility. Challenges encountered include data errors, lack of auto-scaling, database query optimization, and over-reliance on visualization
How to evolve your analytics stack with your business using SnowplowGiuseppe Gaviani
This document discusses evolving an analytics stack to match business changes. It recommends defining self-describing event and entity schemas that can be updated over time. Event data modeling aggregates raw events into modeled data like users and sessions for easier analysis. To evolve the data pipeline, businesses should use self-describing data that allows recomputing models on historical data when new questions arise or data collection changes.
Using Snowplow for A/B testing and user journey analysis at CustomMadeyalisassoon
This document discusses user journeys, analyzing how users interact with a website over time to understand conversion rates. It describes building a unified data model to visualize customer journeys and see that most visitors to listing pages leave the site. The document also discusses how more page views are linked to more conversions and how A/B testing can be done using event tracking to test different page variants and their impact on user behavior and conversions.
Big data meetup budapest adding data schemas to snowplowyalisassoon
The document discusses adding data schemas to Snowplow, an open-source web and event analytics platform. It describes how Snowplow is evolving from a web analytics platform to a general event analytics platform to handle an infinite number of possible event types from any connected device. To address this, the document proposes adding JSON schemas to define the structure of each event type. These schemas would be versioned and stored in a central schema repository/registry to define the structure of raw and enriched events processed by Snowplow.
Snowplow: open source game analytics powered by AWSGiuseppe Gaviani
This is a presentation by Alex Dean and Yali Sassoon at Snowplow about open source game analytics powered by AWS. It was presented at the Games Developer Conference (GDC) in San Francisco, February 2017
Our cofounder Alex Dean gave an introduction to Snowplow and then talked about our roadmap for 2017. Alex touched on several topics including support for more clouds, support for more storage targets, tailoring Snowplow to your industry, more intelligent event sources, moving our batch pipeline to Spark, mega-scale Snowplow and real-time support for Sauna, our decisioning and response system. Presented on 5 April 2017.
Implementing improved and consistent arbitrary event tracking company-wide us...yalisassoon
Talk on the role Snowplow plays as part of the larger project to make data accessible to product marketing and other data-driven teams at StumbleUpon. Touches on technical and organizational challenges
Snowplow: evolve your analytics stack with your businessyalisassoon
Deep dive into how digital analytics stacks need to evolve with businesses, and how self-describing data and event data modeling are the key elements that enable Snowplow data pipeliens to elegantly evolve over time
Data driven video advertising campaigns - JustWatch & SnowplowGiuseppe Gaviani
This document summarizes JustWatch's journey in building a universal customer relationship management (CRM) system focused on the movie industry. It discusses how JustWatch started using the Snowplow open-source data pipeline to collect user data from their movie apps. JustWatch then automated building targeted audiences and managing user identities across channels to power personalized, data-driven movie marketing campaigns on platforms like YouTube and Facebook. The document outlines the key components developed, including an Audience Builder, ID Service, and interactions module. It also shares learnings around questioning real-time needs and the value of separating operational and analytics data stores.
Metail uses Snowplow to collect customer data and Cascalog to process that data into normalized batch views for analysis. Cascalog transforms the raw Snowplow event stream into structured tables for things like customer body shape, orders, items ordered, returns, and browsers. This makes the data more manageable and complex analysis and aggregation easier. For example, Cascalog is used to calculate key performance indicators by grouping customer data and summing metrics from the batch views. The output is then analyzed further in R. Looker will also allow business analysts to access and explore the batch views and raw Snowplow event data.
TripleLift: Preparing for a New Programmatic Ad-Tech WorldVoltDB
Michael Harroun, Director of Backend Architecture at TripleLift explores the benefits of leveraging real-time databases to power their programmatic native advertisement exchange.
Presentation given by Christophe Bogaert at the inaugural Snowplow Meetup New York in March 2016. Christophe described the event data modeling process at a high level before diving into specific tools and techniques for developing performant models.
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015yalisassoon
The document discusses ChefsFeed, a marketing platform for chefs and restaurants. It outlines some of the key challenges with traditional marketing data systems, which focus only on direct conversions like purchases, installations, or registrations. However, this ignores a lot of important user behavior data that occurs between marketing exposure and conversions. The document then introduces Snowplow as a solution that can track all user behavior events consistently across systems, providing a complete picture of the user journey from first exposure through conversions. This unified event data is valuable for understanding user experiences and improving marketing strategies.
Snowplow Analytics: from NoSQL to SQL and back againAlexander Dean
A talk I gave to London NoSQL about Snowplow's journey from using NoSQL (via Amazon S3 and Hive), to columnar storage (via Amazon Redshift and PostgreSQL), and most recently to a mixed model of NoSQL and SQL, including S3, Redshift and Elasticsearch.
SigFig is a financial technology company founded in 2007 whose mission is to make investing more accessible and affordable for everyone. SigFig uses Snowplow, an open-source data collection platform, to track user behavior and metrics on their website similarly to Google Analytics. Snowplow provides critical data for SigFig's reporting and data science needs by replicating their production environment. The document provides examples of how SigFig uses Snowplow data to analyze metrics like user interactions on pages, traffic sources, landing page and funnel performance, and A/B testing results.
KPIs play a different role in startups than mature businesses. In startups, KPIs should focus on measuring progress towards achieving product-market fit rather than traditional metrics like customer acquisition and retention. To develop startup KPIs, companies first identify key success factors that drive product-market fit, then establish one or a few KPIs to measure each success factor. Good startup KPIs are relevant, responsive, easy to understand, and part of a broader analytics effort to inform ongoing product development.
This document provides an overview of the big data technology stack, including the data layer (HDFS, S3, GPFS), data processing layer (MapReduce, Pig, Hive, HBase, Cassandra, Storm, Solr, Spark, Mahout), data ingestion layer (Flume, Kafka, Sqoop), data presentation layer (Kibana), operations and scheduling layer (Ambari, Oozie, ZooKeeper), and concludes with a brief biography of the author.
Customer Event Hub - the modern Customer 360° viewGuido Schmutz
Today, companies are using various channels to communicate with their customers. As a consequence, a lot of data is created, more and more also outside of the traditional IT infrastructure of an enterprise. This data often does not have a common format and they are continuously created with ever increasing volume. With Internet of Things (IoT) and their sensors, the volume as well as the velocity of data just gets more extreme.
To achieve a complete and consistent view of a customer, all these customer-related information has to be included in a 360 degree view in a real-time or near-real-time fashion. By that, the Customer Hub will become the Customer Event Hub. It constantly shows the actual view of a customer over all his interaction channels and provides an enterprise the basis for a substantial and effective customer relation.
In this presentation the value of such a platform is shown and how it can be implemented.
Data-Driven Development Era and Its TechnologiesSATOSHI TAGOMORI
This document discusses data-driven development and the technologies used in the data analytics process. It covers topics like data collection, storage, processing, and visualization. The document advocates using managed cloud services for data and analytics to focus on data instead of managing infrastructure. Choosing technologies should be based on the type of data and problems to solve, not the other way around. Services like Google BigQuery, Amazon Redshift, and Treasure Data are recommended for their ease of use.
Using AWS to design and build your data architecture has never been easier to gain insights and uncover new opportunities to scale and grow your business. Join this workshop to learn how you can gain insights at scale with the right big data applications.
Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...Trivadis
Unternehmen kommunizieren heute über verschiedenste Kanäle mit ihren Kunden. Dabei entstehen viele Daten in unterschiedlichen Systemen, immer öfter auch ausserhalb des Unternehmens. Diese Daten haben oft keine einheitlichen Formate und werden kontinuierlich und mit grösser werdendem Volumen erzeugt. Mit IoT Anwendungen wird dies nur noch extremer. Um eine komplette und konsistente Sicht über den Kunden zu haben, müssen all diese kundenbezogenen Informationen in eine 360 Grad Sicht einbezogen werden und dies möglichst in Echtzeit. Der Customer Hub wird damit zum Customer Event Hub
This document discusses analyzing IoT data in real time using Microsoft Azure and Hortonworks Data Platform. It begins with an introduction to IoT and how organizations benefit from IoT data. It then discusses how Azure and Hortonworks can help handle large and fast streaming IoT data. The document demonstrates a use case of analyzing real-time sensor data from vehicles in the logistics industry. It collects data using Kafka, processes it using Storm, stores it on Hadoop and visualizes it using Power BI. It concludes by discussing how organizations can leverage IoT data analytics.
This document discusses analyzing IoT data in real time using Microsoft Azure and Hortonworks Data Platform. It begins with an introduction to IoT and how organizations benefit from IoT data. It then discusses how Azure and Hortonworks can help handle large and fast streaming IoT data. The document demonstrates a use case of analyzing real-time sensor data from vehicles in the logistics industry. It collects data using Kafka, processes it using Storm, stores it on Hadoop and visualizes it using Power BI. It concludes by discussing how organizations can leverage IoT data analytics.
Top Business Intelligence Trends for 2016 by Panorama SoftwarePanorama Software
10 top BI trends for 2016 – by Panorama
Its all about the insight
Visual perception rules
The learning suggestive system - AI gets real
The data product chain becomes democratized
Cloud (finally)
“Mobile”
Automated data integration
Interned of things data accelerating into reality
Hadoop accelerators are the last chance for Hadoop
Fading of the centralized on–premise DWH
This document provides an overview of big data and business analytics. It discusses the key characteristics of big data, including volume, variety, and velocity. Volume refers to the enormous and growing amount of data being generated. Variety means data comes in all types from structured to unstructured. Velocity indicates that data is being created in real-time and needs to be analyzed rapidly. The document also outlines some of the challenges of big data and how cloud computing and technologies like Hadoop are helping to manage and analyze large, complex data sets.
This document summarizes a presentation about the graph database Neo4j. The presentation included an agenda that covered graphs and their power, how graphs change data views, and real-time recommendations with graphs. It introduced the presenters and discussed how data relationships unlock value. It described how Neo4j allows modeling data as a graph to unlock this value through relationship-based queries, evolution of applications, and high performance at scale. Examples showed how Neo4j outperforms relational and NoSQL databases when relationships are important. The presentation concluded with examples of how Neo4j customers have benefited.
This document discusses strategies for successfully utilizing a data lake. It notes that creating a data lake is just the beginning and that challenges include data governance, metadata management, access, and effective use of the data. The document advocates for data democratization through discovery, accessibility, and usability. It also discusses best practices like self-service BI and automated workload migration from data warehouses to reduce costs and risks. The key is to address the "data lake dilemma" of these challenges to avoid a "data swamp" and slow adoption.
This document introduces Snowplow, an open-source web and event analytics platform. Snowplow was created in 2012 by Alex Dean and Yali Sassoon to address limitations in traditional analytics programs. It allows users to capture, transform, store and analyze granular event-level data in their own data warehouses. Snowplow uses a loosely coupled architecture and supports collecting data from any source and storing it in various databases. It aims to provide a platform for real-time and offline analytics across an organization.
5 Essential Practices of the Data Driven OrganizationVivastream
The document discusses five essential practices of data-driven organizations: 1) defining key performance indicators, 2) deploying analytics tools expertly across channels, 3) analyzing results and making recommendations, 4) creating changes based on data, and 5) measuring results continuously. It emphasizes the importance of standardization, governance, accuracy, and having a repeatable process for using data to optimize digital properties and drive business goals.
Winning in Today's Data-Centric Economy (Part 1)Alexander Loth
This document discusses how data and analytics are central to success in today's digital economy. It notes that existing business systems were built for products and transactions, not long-term customer relationships, and data is now everywhere. The document advocates developing a data-centric strategy that uses analytics to extract value from data and wrap data around customers to create business value. It provides examples of how analytics can reduce report creation time and help organizations better understand their data, customers, and make strategic decisions.
The future of application and data integration is in the cloud. The SnapLogic Integration Cloud.
Your personal information, including your email address, will be held in the strictest of confidence and will never be shared with anyone.
In a world that has never been more connected, how do you ensure that your business data isn't fragmented?
Salesforce, ServiceNow, Workday - are they streamlined or SaaS silos?
Whether you need real-time application integration, business process automation, or better business analytics, SnapLogic delivers an integration platform as a service (iPaaS) that connects all of your cloud applications, APIs, and disparate data sources with the rest of the enterprise.\
Learn about the SnapLogic Integration Cloud Winter 2014 release:
- Get a faster time to value with our HTML5, cloud Designer
- Learn about Elastic Integration for hybrid and cloud-to-cloud deployments
- Dig into our enterprise-ready security, administration and monitoring
- Connect in a Snap with 160+ Snaps in the SnapStore
Learn more at http: http://www.snaplogic.com/
Cloud Machine Learning can help make sense of unstructured data, which accounts for 90% of enterprise data. It provides a fully managed machine learning service to train models using TensorFlow and automatically maximize predictive accuracy with hyperparameter tuning. Key benefits include scalable training and prediction infrastructure, integrated tools like Cloud Datalab for exploring data and developing models, and pay-as-you-go pricing.
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB
Bernard Doering, Senior Slaes Director DACH, Cloudera.
Hadoop and the Future of Data Management. As Hadoop takes the data management market by storm, organisations are evolving the role it plays in the modern data centre. Explore how this disruptive technology is quickly transforming an industry and how you can leverage it today, in combination with MongoDB, to drive meaningful change in your business.
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE) Guido Schmutz
Today, companies are using various channels to communicate with their customers. As a consequence, a lot of data is created, more and more also outside of the traditional IT infrastructure of an enterprise. This data often does not have a common format and they are continuously created with ever increasing volume. With Internet of Things (IoT) and their sensors, the volume as well as the velocity of data just gets more extreme.
To achieve a complete and consistent view of a customer, all these customer-related information has to be included in a 360 degree view in a real-time or near-real-time fashion. By that, the Customer Hub will become the Customer Event Hub. It constantly shows the actual view of a customer over all his interaction channels and provides an enterprise the basis for a substantial and effective customer relation.
In this presentation the value of such a platform is shown and how it can be implemented using DataStax Enterprise as the backend.
Learn more - http://www.talend.com/products/talend-6
When you’re ready to move to Big Data, connect in the cloud, and across the Internet of Things, Talend 6 streamlines the process. Convert traditional data integration jobs and MapReduce jobs to Spark with the click of a button, and realize the potential of real-time data-driven decision making. Learn more about Talend and Spark.
Talend 6 also brings continuous delivery, MDM REST API, plus data masking and semantic discovery to our products.
Similar to Snowplow: where we came from and where we are going - March 2016 (20)
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Introduction to Jio Cinema**:
- Brief overview of Jio Cinema as a streaming platform.
- Its significance in the Indian market.
- Introduction to retention and engagement strategies in the streaming industry.
2. **Understanding Retention and Engagement**:
- Define retention and engagement in the context of streaming platforms.
- Importance of retaining users in a competitive market.
- Key metrics used to measure retention and engagement.
3. **Jio Cinema's Content Strategy**:
- Analysis of the content library offered by Jio Cinema.
- Focus on exclusive content, originals, and partnerships.
- Catering to diverse audience preferences (regional, genre-specific, etc.).
- User-generated content and interactive features.
4. **Personalization and Recommendation Algorithms**:
- How Jio Cinema leverages user data for personalized recommendations.
- Algorithmic strategies for suggesting content based on user preferences, viewing history, and behavior.
- Dynamic content curation to keep users engaged.
5. **User Experience and Interface Design**:
- Evaluation of Jio Cinema's user interface (UI) and user experience (UX).
- Accessibility features and device compatibility.
- Seamless navigation and search functionality.
- Integration with other Jio services.
6. **Community Building and Social Features**:
- Strategies for fostering a sense of community among users.
- User reviews, ratings, and comments.
- Social sharing and engagement features.
- Interactive events and campaigns.
7. **Retention through Loyalty Programs and Incentives**:
- Overview of loyalty programs and rewards offered by Jio Cinema.
- Subscription plans and benefits.
- Promotional offers, discounts, and partnerships.
- Gamification elements to encourage continued usage.
8. **Customer Support and Feedback Mechanisms**:
- Analysis of Jio Cinema's customer support infrastructure.
- Channels for user feedback and suggestions.
- Handling of user complaints and queries.
- Continuous improvement based on user feedback.
9. **Multichannel Engagement Strategies**:
- Utilization of multiple channels for user engagement (email, push notifications, SMS, etc.).
- Targeted marketing campaigns and promotions.
- Cross-promotion with other Jio services and partnerships.
- Integration with social media platforms.
10. **Data Analytics and Iterative Improvement**:
- Role of data analytics in understanding user behavior and preferences.
- A/B testing and experimentation to optimize engagement strategies.
- Iterative improvement based on data-driven insights.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Enhanced data collection methods can help uncover the true extent of child abuse and neglect. This includes Integrated Data Systems from various sources (e.g., schools, healthcare providers, social services) to identify patterns and potential cases of abuse and neglect.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of March 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
2. Snowplow was born in 2012
Web data: rich but GA /
SiteCatalyst are limited
“Big data” tech
• Marketing, not product
analytics
• Silo’d: can’t join with
other customer data
Snowplow
• Open source
frameworks
• Cloud services
• Open source click
stream data warehouse
• Event level: any query
• Built on top of
Cloudfront / EMR /
Hadoop
3. The plan: spend 6 months
building a pipeline…
…then get back to using the data
5. Increased project scope
• Click stream data warehouse -> Event analytics
platform
• Collect events from anywhere, not just the web
• Make event data actionable in real-time
• Support more in-pipeline processing steps (enrichment
and modeling)
• Support more storage targets (where your data is has big
implications for what you can do with that data)
9. What makes Snowplow special?
• Data pipeline evolves with your business
• Channel coverage
• Flexibility: where your data is delivered
• Flexibility: how your data is processed
(enrichment and modeling)
• Data quality
• Speed
• Transparency
10. Used by 100s (1000s?) of companies…
…to answer their most important business questions
11. But there’s still much more to
build!
• Improve automation around schema evolution
• Make modeling event data easier, more robust,
more performant
• Support more storage targets
• Make it easier to act on event data
Data modeling in Spark
Druid, BigQuery, graph databases
Analytics SDKs, Sauna
Iglu: machine-readable schema registry