The document summarizes Krist Wongsuphasawat's presentation on visualizing event sequences at the 2013 Data Visualization Summit in San Francisco. Wongsuphasawat discussed techniques for visualizing event sequences, including using glyphs on a timeline to represent events, using interval width to represent duration, color and shape to distinguish event types, faceting for high density sequences, and aggregation techniques like binning and kernel density estimation. He demonstrated the LifeFlow tool for providing overviews and summaries of event sequence data. Wongsuphasawat also discussed alignment of sequences, outcome-based aggregation with the Outflow tool, and applications to analyzing big event sequence data like customer checkout processes at eBay.
Explainable AI (XAI) is becoming Must-Have NFR for most AI enabled product or solution deployments. Keen to know viewpoints and collaboration opportunities.
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, sales, lending, and fraud detection. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
Scaling up uber's real time data analyticsXiang Fu
Realtime infrastructure powers critical pieces of Uber. This talk will discuss the architecture, technical challenges, learnings and how a blend of open source infrastructure (Apache Kafka/Flink/Pinot) and in-house technologies have helped Uber scale and enabled SQL to power realtime decision making for city ops, data scientists, data analysts and engineers.
Productionalizing Models through CI/CD Design with MLflowDatabricks
Often times model deployment and integration consists of several moving parts that require intricate steps woven together. Automating this pipeline and feedback loop can be incredibly challenging, especially in lieu of varying model development techniques.
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveHuahai Yang
Generative AI: Past, Present, and Future – A Practitioner's Perspective
As the academic realm grapples with the profound implications of generative AI
and related applications like ChatGPT, I will present a grounded view from my
experience as a practitioner. Starting with the origins of neural networks in
the fields of logic, psychology, and computer science, I trace its history and
align it within the wider context of the pursuit of artificial intelligence.
This perspective will also draw parallels with historical developments in
psychology. Against this backdrop, I chart a proposed trajectory for the future.
Finally, I provide actionable insights for both academics and enterprising
individuals in the field.
[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221]
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
Explainable AI (XAI) is becoming Must-Have NFR for most AI enabled product or solution deployments. Keen to know viewpoints and collaboration opportunities.
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, sales, lending, and fraud detection. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
Scaling up uber's real time data analyticsXiang Fu
Realtime infrastructure powers critical pieces of Uber. This talk will discuss the architecture, technical challenges, learnings and how a blend of open source infrastructure (Apache Kafka/Flink/Pinot) and in-house technologies have helped Uber scale and enabled SQL to power realtime decision making for city ops, data scientists, data analysts and engineers.
Productionalizing Models through CI/CD Design with MLflowDatabricks
Often times model deployment and integration consists of several moving parts that require intricate steps woven together. Automating this pipeline and feedback loop can be incredibly challenging, especially in lieu of varying model development techniques.
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveHuahai Yang
Generative AI: Past, Present, and Future – A Practitioner's Perspective
As the academic realm grapples with the profound implications of generative AI
and related applications like ChatGPT, I will present a grounded view from my
experience as a practitioner. Starting with the origins of neural networks in
the fields of logic, psychology, and computer science, I trace its history and
align it within the wider context of the pursuit of artificial intelligence.
This perspective will also draw parallels with historical developments in
psychology. Against this backdrop, I chart a proposed trajectory for the future.
Finally, I provide actionable insights for both academics and enterprising
individuals in the field.
[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221]
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
MLOps (a compound of “machine learning” and “operations”) is a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning lifecycle. Similar to the DevOps term in the software development world, MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements. MLOps applies to the entire ML lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics.
To watch the full presentation click here: https://info.cnvrg.io/mlopsformachinelearning
In this webinar, we’ll discuss core practices in MLOps that will help data science teams scale to the enterprise level. You’ll learn the primary functions of MLOps, and what tasks are suggested to accelerate your teams machine learning pipeline. Join us in a discussion with cnvrg.io Solutions Architect, Aaron Schneider, and learn how teams use MLOps for more productive machine learning workflows.
- Reduce friction between science and engineering
- Deploy your models to production faster
- Health, diagnostics and governance of ML models
- Kubernetes as a core platform for MLOps
- Support advanced use-cases like continual learning with MLOps
ROCm and Distributed Deep Learning on Spark and TensorFlowDatabricks
ROCm, the Radeon Open Ecosystem, is an open-source software foundation for GPU computing on Linux. ROCm supports TensorFlow and PyTorch using MIOpen, a library of highly optimized GPU routines for deep learning. In this talk, we describe how Apache Spark is a key enabling platform for distributed deep learning on ROCm, as it enables different deep learning frameworks to be embedded in Spark workflows in a secure end-to-end machine learning pipeline. We will analyse the different frameworks for integrating Spark with Tensorflow on ROCm, from Horovod to HopsML to Databrick's Project Hydrogen. We will also examine the surprising places where bottlenecks can surface when training models (everything from object stores to the Data Scientists themselves), and we will investigate ways to get around these bottlenecks. The talk will include a live demonstration of training and inference for a Tensorflow application embedded in a Spark pipeline written in a Jupyter notebook on Hopsworks with ROCm.
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
According to Forrester Research, only 22% of companies are currently seeing a significant return from data science expenditures. Most data science implementations are high-cost IT projects, local applications that are not built to scale for production workflows, or laptop decision support projects that never impact customers. Despite this high failure rate, we keep hearing the same mantra and solutions over and over again. Everybody talks about how to create models, but not many people talk about getting them into production where they can impact customers.
Harvinder Atwal offers an entertaining and practical introduction to DataOps, a new and independent approach to delivering data science value at scale, used at companies like Facebook, Uber, LinkedIn, Twitter, and eBay. The key to adding value through DataOps is to adapt and borrow principles from Agile, Lean, and DevOps. However, DataOps is not just about shipping working machine learning models; it starts with better alignment of data science with the rest of the organization and its goals. Harvinder shares experience-based solutions for increasing your velocity of value creation, including Agile prioritization and collaboration, new operational processes for an end-to-end data lifecycle, developer principles for data scientists, cloud solution architectures to reduce data friction, self-service tools giving data scientists freedom from bottlenecks, and more. The DataOps methodology will enable you to eliminate daily barriers, putting your data scientists in control of delivering ever-faster cutting-edge innovation for your organization and customers.
In this presentation, two different data-sets are being collected to implement the machine learning classification techniques introduced from introduction to data mining and machine learning coursework. Both data-sets are collected by analyzing their output and team members interest. Following are the data-sets named as, Electricity grid stability simulated data-set and Face Recognition on Olivetti Data set
Building and deploying LLM applications with Apache AirflowKaxil Naik
Behind the growing interest in Generate AI and LLM-based enterprise applications lies an expanded set of requirements for data integrations and ML orchestration. Enterprises want to use proprietary data to power LLM-based applications that create new business value, but they face challenges in moving beyond experimentation. The pipelines that power these models need to run reliably at scale, bringing together data from many sources and reacting continuously to changing conditions.
This talk focuses on the design patterns for using Apache Airflow to support LLM applications created using private enterprise data. We’ll go through a real-world example of what this looks like, as well as a proposal to improve Airflow and to add additional Airflow Providers to make it easier to interact with LLMs such as the ones from OpenAI (such as GPT4) and the ones on HuggingFace, while working with both structured and unstructured data.
In short, this shows how these Airflow patterns enable reliable, traceable, and scalable LLM applications within the enterprise.
https://airflowsummit.org/sessions/2023/keynote-llm/
Processes are the building blocks of every organization. Yet, many organizations do not have consistent and repeatable processes. Research shows that projects managed using structured processes leveraging “best practices” consistently show higher performance than those that do not. This session focuses on a method from ISO to improve processes and eliminate defects. Assessing process capability demonstrably helps lower risk associated with the processes.
Main points covered:
• What is a Process Reference Model?
• What is process capability and how do I measure it?
• How to use a Process Assessment Model to assess processes?
Presenter:
Peter Davis is the Principal of Peter Davis Associates, a management consulting firm specializing in Governance, Security, and Audit. Prior to founding PDA, Mr. Davis’ private sector experience included stints with two large Canadian banks and a manufacturing company. He was formerly a principal in the Information Systems Audit practice of Ernst & Young. In the public sector, Mr. Davis was Director of Information Systems Audit in the Office of the Provincial Auditor (Ontario), where he had oversight audit responsibilities for all Ontario crown corporations, agencies and boards.
Mr. Davis has written or co-written 13 books including “Project Management Process Capability Assessment,” “Lean Six Sigma Secrets for the CIO,” and “Hacking Wireless Networks for Dummies.” Peter currently teaches COBIT 5 Foundation/Implementation/Assessor/Implementing NIST Cyber-security Framework using COBIT 5, ISO 20000 FC/LI/LA ISO 27001 LI/LA, ISO 27032 LM, ISO 27005 RM, and ISO 31000 RM.
Organizer: Ardian Berisha
Date: September 5th, 2018
Recorded webinar link: https://youtu.be/NECQ5Angadw
Slidedeck from our seminar about Machine Learning (07/11/2014)
Topics covered:
- What is Machine Learning?
- Techiques (clustering, classification, ...)
- Tools (Mahout, R, Spark MlLib, Weka, ...)
- Practical example of Machine Learning applications
- How to embed Machine Learning in software development
- Demo's
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
Drifting Away: Testing ML Models in ProductionDatabricks
Deploying machine learning models has become a relatively frictionless process. However, properly deploying a model with a robust testing and monitoring framework is a vastly more complex task. There is no one-size-fits-all solution when it comes to productionizing ML models, oftentimes requiring custom implementations utilising multiple libraries and tools. There are however, a set of core statistical tests and metrics one should have in place to detect phenomena such as data and concept drift to prevent models from becoming unknowingly stale and detrimental to the business.
Combining our experiences from working with Databricks customers, we do a deep dive on how to test your ML models in production using open source tools such as MLflow, SciPy and statsmodels. You will come away from this talk armed with knowledge of the key tenets for testing both model and data validity in production, along with a generalizable demo which uses MLflow to assist with the reproducibility of this process.
This session is continuation of “Automated Production Ready ML at Scale” in last Spark AI Summit at Europe. In this session you will learn about how H&M evolves reference architecture covering entire MLOps stack addressing a few common challenges in AI and Machine learning product, like development efficiency, end to end traceability, speed to production, etc.
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...Databricks
Airbnb has a wide variety of ML problems ranging from models on traditional structured data to models built on unstructured data such as user reviews, messages and listing images. The ability to build, iterate on, and maintain healthy machine learning models is critical to Airbnb’s success. Many ML Platforms cover data collection, feature engineering, training, deploying, productionalization, and monitoring but few, if any, do all of the above seamlessly.
Bighead aims to tie together various open source and in-house projects to remove incidental complexity from ML workflows. Bighead is built on Python and Spark and can be used in modular pieces as each ML problem presents unique challenges. Through standardization of the path to production, training environments and the methods for collecting and transforming data on Spark, each model is reproducible and iterable.
This talk covers the architecture, the problems that each individual component and the overall system aims to solve, and a vision for the future of machine learning infrastructure. It’s widely adapted in Airbnb and we have variety of models running in production. We have seen the overall model development time go down from many months to days on Bighead. We plan to open source Bighead to allow the wider community to benefit from our work.
MLOps (a compound of “machine learning” and “operations”) is a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning lifecycle. Similar to the DevOps term in the software development world, MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements. MLOps applies to the entire ML lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics.
To watch the full presentation click here: https://info.cnvrg.io/mlopsformachinelearning
In this webinar, we’ll discuss core practices in MLOps that will help data science teams scale to the enterprise level. You’ll learn the primary functions of MLOps, and what tasks are suggested to accelerate your teams machine learning pipeline. Join us in a discussion with cnvrg.io Solutions Architect, Aaron Schneider, and learn how teams use MLOps for more productive machine learning workflows.
- Reduce friction between science and engineering
- Deploy your models to production faster
- Health, diagnostics and governance of ML models
- Kubernetes as a core platform for MLOps
- Support advanced use-cases like continual learning with MLOps
ROCm and Distributed Deep Learning on Spark and TensorFlowDatabricks
ROCm, the Radeon Open Ecosystem, is an open-source software foundation for GPU computing on Linux. ROCm supports TensorFlow and PyTorch using MIOpen, a library of highly optimized GPU routines for deep learning. In this talk, we describe how Apache Spark is a key enabling platform for distributed deep learning on ROCm, as it enables different deep learning frameworks to be embedded in Spark workflows in a secure end-to-end machine learning pipeline. We will analyse the different frameworks for integrating Spark with Tensorflow on ROCm, from Horovod to HopsML to Databrick's Project Hydrogen. We will also examine the surprising places where bottlenecks can surface when training models (everything from object stores to the Data Scientists themselves), and we will investigate ways to get around these bottlenecks. The talk will include a live demonstration of training and inference for a Tensorflow application embedded in a Spark pipeline written in a Jupyter notebook on Hopsworks with ROCm.
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
According to Forrester Research, only 22% of companies are currently seeing a significant return from data science expenditures. Most data science implementations are high-cost IT projects, local applications that are not built to scale for production workflows, or laptop decision support projects that never impact customers. Despite this high failure rate, we keep hearing the same mantra and solutions over and over again. Everybody talks about how to create models, but not many people talk about getting them into production where they can impact customers.
Harvinder Atwal offers an entertaining and practical introduction to DataOps, a new and independent approach to delivering data science value at scale, used at companies like Facebook, Uber, LinkedIn, Twitter, and eBay. The key to adding value through DataOps is to adapt and borrow principles from Agile, Lean, and DevOps. However, DataOps is not just about shipping working machine learning models; it starts with better alignment of data science with the rest of the organization and its goals. Harvinder shares experience-based solutions for increasing your velocity of value creation, including Agile prioritization and collaboration, new operational processes for an end-to-end data lifecycle, developer principles for data scientists, cloud solution architectures to reduce data friction, self-service tools giving data scientists freedom from bottlenecks, and more. The DataOps methodology will enable you to eliminate daily barriers, putting your data scientists in control of delivering ever-faster cutting-edge innovation for your organization and customers.
In this presentation, two different data-sets are being collected to implement the machine learning classification techniques introduced from introduction to data mining and machine learning coursework. Both data-sets are collected by analyzing their output and team members interest. Following are the data-sets named as, Electricity grid stability simulated data-set and Face Recognition on Olivetti Data set
Building and deploying LLM applications with Apache AirflowKaxil Naik
Behind the growing interest in Generate AI and LLM-based enterprise applications lies an expanded set of requirements for data integrations and ML orchestration. Enterprises want to use proprietary data to power LLM-based applications that create new business value, but they face challenges in moving beyond experimentation. The pipelines that power these models need to run reliably at scale, bringing together data from many sources and reacting continuously to changing conditions.
This talk focuses on the design patterns for using Apache Airflow to support LLM applications created using private enterprise data. We’ll go through a real-world example of what this looks like, as well as a proposal to improve Airflow and to add additional Airflow Providers to make it easier to interact with LLMs such as the ones from OpenAI (such as GPT4) and the ones on HuggingFace, while working with both structured and unstructured data.
In short, this shows how these Airflow patterns enable reliable, traceable, and scalable LLM applications within the enterprise.
https://airflowsummit.org/sessions/2023/keynote-llm/
Processes are the building blocks of every organization. Yet, many organizations do not have consistent and repeatable processes. Research shows that projects managed using structured processes leveraging “best practices” consistently show higher performance than those that do not. This session focuses on a method from ISO to improve processes and eliminate defects. Assessing process capability demonstrably helps lower risk associated with the processes.
Main points covered:
• What is a Process Reference Model?
• What is process capability and how do I measure it?
• How to use a Process Assessment Model to assess processes?
Presenter:
Peter Davis is the Principal of Peter Davis Associates, a management consulting firm specializing in Governance, Security, and Audit. Prior to founding PDA, Mr. Davis’ private sector experience included stints with two large Canadian banks and a manufacturing company. He was formerly a principal in the Information Systems Audit practice of Ernst & Young. In the public sector, Mr. Davis was Director of Information Systems Audit in the Office of the Provincial Auditor (Ontario), where he had oversight audit responsibilities for all Ontario crown corporations, agencies and boards.
Mr. Davis has written or co-written 13 books including “Project Management Process Capability Assessment,” “Lean Six Sigma Secrets for the CIO,” and “Hacking Wireless Networks for Dummies.” Peter currently teaches COBIT 5 Foundation/Implementation/Assessor/Implementing NIST Cyber-security Framework using COBIT 5, ISO 20000 FC/LI/LA ISO 27001 LI/LA, ISO 27032 LM, ISO 27005 RM, and ISO 31000 RM.
Organizer: Ardian Berisha
Date: September 5th, 2018
Recorded webinar link: https://youtu.be/NECQ5Angadw
Slidedeck from our seminar about Machine Learning (07/11/2014)
Topics covered:
- What is Machine Learning?
- Techiques (clustering, classification, ...)
- Tools (Mahout, R, Spark MlLib, Weka, ...)
- Practical example of Machine Learning applications
- How to embed Machine Learning in software development
- Demo's
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
Drifting Away: Testing ML Models in ProductionDatabricks
Deploying machine learning models has become a relatively frictionless process. However, properly deploying a model with a robust testing and monitoring framework is a vastly more complex task. There is no one-size-fits-all solution when it comes to productionizing ML models, oftentimes requiring custom implementations utilising multiple libraries and tools. There are however, a set of core statistical tests and metrics one should have in place to detect phenomena such as data and concept drift to prevent models from becoming unknowingly stale and detrimental to the business.
Combining our experiences from working with Databricks customers, we do a deep dive on how to test your ML models in production using open source tools such as MLflow, SciPy and statsmodels. You will come away from this talk armed with knowledge of the key tenets for testing both model and data validity in production, along with a generalizable demo which uses MLflow to assist with the reproducibility of this process.
This session is continuation of “Automated Production Ready ML at Scale” in last Spark AI Summit at Europe. In this session you will learn about how H&M evolves reference architecture covering entire MLOps stack addressing a few common challenges in AI and Machine learning product, like development efficiency, end to end traceability, speed to production, etc.
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...Databricks
Airbnb has a wide variety of ML problems ranging from models on traditional structured data to models built on unstructured data such as user reviews, messages and listing images. The ability to build, iterate on, and maintain healthy machine learning models is critical to Airbnb’s success. Many ML Platforms cover data collection, feature engineering, training, deploying, productionalization, and monitoring but few, if any, do all of the above seamlessly.
Bighead aims to tie together various open source and in-house projects to remove incidental complexity from ML workflows. Bighead is built on Python and Spark and can be used in modular pieces as each ML problem presents unique challenges. Through standardization of the path to production, training environments and the methods for collecting and transforming data on Spark, each model is reproducible and iterable.
This talk covers the architecture, the problems that each individual component and the overall system aims to solve, and a vision for the future of machine learning infrastructure. It’s widely adapted in Airbnb and we have variety of models running in production. We have seen the overall model development time go down from many months to days on Bighead. We plan to open source Bighead to allow the wider community to benefit from our work.
This talk was prepared as a note to my future self when working on future projects. I reflect on the tasks commonly involved in crafting visualizations, point out the common things to expect, pitfalls and provide recommendations. Along the way I include examples of 3 different applications of information/data visualization and details on how each project was started and developed.
These slides were from my guest lecture in InfoVis class at
(1) InfoVis class at UC Berkeley iSchool on Feb 27, 2017. Thank you Prof. Marti Hearst for the invitation.
(2) DataVis class at GATech on Apr 5, 2017. Thank you Prof. Rahul C. Basole for the invitation.
Paper presentation at the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI). Vancouver, BC (May 10, 2011)
More info:
http://www.cs.umd.edu/hcil/lifeflow
Abstract:
Event sequence analysis is an important task in many domains: medical researchers may study the patterns of transfers within the hospital for quality control; transportation experts may study accident response logs to identify best practices. In many cases they deal with thousands of records. While previous research has focused on searching and browsing, overview tasks are often overlooked. We introduce a novel interactive visual overview of event sequences called LifeFlow. LifeFlow is scalable, can summarize all possible sequences, and represents the temporal spacing of the events within sequences. Two case studies with healthcare and transportation domain experts are presented to illustrate the usefulness of LifeFlow. A user study with ten participants confirmed that after 15 minutes of training novice users were
able to rapidly answer questions about the prevalence and temporal characteristics
of sequences, find anomalies, and gain significant insight from the data.
Real-time event processing monitors the incoming data stream and initiates action based on detected events like fraud, error or performance degradation. These events are often used to issue alerts and notifications, take responsive action, or to populate a monitoring dashboard. In this session, we will walk through different use cases for event processing and demonstrate how to build a scalable pipeline for tracking IoT device status. AWS services to be covered include: AWS Lambda and the Kinesis Client Library (KCL).
I present the design and implementation of an ontology for scholarly event description (SEDE) to provide a backbone to represent, collect, share and allow inference from scholarly event information
Business Event Procesing Beyond The HorizonOpher Etzion
This is a presentation given in IBM Websphere IMPACT 2009, May 2009, Las Vegas together with Kyle Brown. It contains some thoughts that are demonstrated through customers' scenarios on future functionality in event processing products.
DataEngConf SF16 - Multi-temporal Data StructuresHakka Labs
A mind-bending way of dealing with time syncing when aggregating data from many disparate sources. Talk by Jasmine Tsai and Alyssa Kwan, Clover Health. To hear about future conferences go to http://dataengconf.com
Mapping and analysis use in humanitarian aid and development that can provide better insight and action.
Open Data, open platforms, and open collaboration.
These slides review the research of our lab since 2016 on applied deep learning, starting from our participation in the TRECVID Instance Search 2014, moving into video analysis with CNN+RNN architectures, and our current efforts in sign language translation and production.
As part of their daily work, developers interact with Integrated Development Environments (IDE), generating thousands of events. Together with other aspects of development, this data also captures the modus operandi of the developer, including all the program entities she interacted with during a development session. This "working set" (or context) is leveraged by developers to create and maintain their mental model of the software system at hand. Understanding how developers navigate and interact with source code during a development session is an open question.
We present a novel visual approach to understand how working sets evolve during a development session. The visualization incrementally depicts all the program entities involved in a development session, the intensity of the developer activity on them, and the navigation paths that occurred between them. We visualized about a thousand development sessions, and categorized them according to their visual properties.
Mapping, open data, and open platforms in support of crisis response, humanitarian aid and development. How organizations can better understand and share their impacts.
This was prepared for a presentation of http://swiftapp.org at InSTEDD. It gives an overview of the latest designs (which are changing quickly) and a backgrounder on how we got to this point.
Brief introduction on CEP and Terminology
o Drools Vision
o Drools Fusion: Complex Event Processing extensions
o Event Declaration and Semantics
o Event Cloud, Streams and the Session Clock
o Temporal Reasoning
o Sliding Window Support
o Streams Support
o Memory Management
Similar to Visualization for Event Sequences Exploration (16)
“Which visualization library should I use?” Typically, making this decision is not about whether one library is “better” than another, but whether the specific library is more suitable for what the developer is trying to achieve.To answer this question thoroughly, we need to better understand the design space of visualization libraries. The talk will give a tour of many kinds of visualization libraries on the web across the design space, while explaining the framework and design philosophy that the audience can learn along the way. The audience will expand their horizon and be more aware of the wide universe of libraries. The next time they come across a new package, they can use this framework as a lens to analyze its own offerings and how it is different from or similar to the libraries that they already know.
Encodable: Configurable Grammar for Visualization ComponentsKrist Wongsuphasawat
There are so many libraries of visualization components nowadays with their APIs often different from one another. Could these components be more similar, both in terms of the APIs and common functionalities? For someone who is developing a new visualization component, how should the API look like? This work drew inspiration from visualization grammar, decoupled the grammar from its rendering engine and adapted it into a configurable grammar for individual components called Encodable. Encodable helps component authors define grammar for their components, and parse encoding specifications from users into utility functions for the implementation.
This talk was prepared as a note to my future self when working on future projects. I reflect on the tasks commonly involved in crafting visualizations, point out the common things to expect, pitfalls and provide recommendations. Along the way I include examples of different applications of information/data visualization and details on how each project was started and developed.
These slides were from my (remote) guest lecture in InfoVis class for UC Berkeley iSchool on Apr 8, 2020 during the COVID-19 shelter-in-place. Thank you Prof. Marti Hearst for the invitation.
Slides from the VIS in practice panel "Increasing the Impact of Visualization Research" during IEEE VIS 2017 in Phoenix, AZ. http://www.visinpractice.rwth-aachen.de/panel.html
Reveal the talking points of every episode of Game of Thrones from fans' conv...Krist Wongsuphasawat
You may not be sure how Lord Varys collects information from his little birds, but in this talk you will hear how we can collect information from our little birds.
@kristw shares a behind-the-scenes view of his latest data visualization project, which shows how each #GameOfThrones episode was discussed on Twitter. Using data visualization, we can extract and reveal the stories of every episode from fans’ Tweets.
https://interactive.twitter.com/game-of-thrones
These slides are from a talk given at Bay Area d3 User Group meetup on June 9, 2016.
http://www.meetup.com/Bay-Area-d3-User-Group/events/231281298
In this talk, I reflect on the tasks commonly involved in crafting visualizations and show examples of different applications of information/data visualization. Along this ride I will share my workflow, point out the common pitfalls and provide recommendations.
These slides were from my guest lecture in InfoVis class at UC Berkeley iSchool on Apr 11, 2016. Thank you Prof. Marti Hearst for inviting.
Adventure in Data: A tour of visualization projects at TwitterKrist Wongsuphasawat
Guest lecture at Prof. David Gotz's UNC Chapel Hill INLS 690 Visual Analytics class (Given remotely) on Nov 10, 2015.
Many demos can also be accessed from interactive.twitter.com and kristw.yellowpigz.com
d3Kit is a set of tools to speed D3 related project development. It is a lightweight library to help you do the basic groundwork tasks you need when building visualization with d3.
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Krist Wongsuphasawat
Slides from my talk at the IEEE Conference on Visual Analytics Science and Technology (VAST) 2014 in Paris, France.
ABSTRACT
Logging user activities is essential to data analysis for internet products and services.
Twitter has built a unified logging infrastructure that captures user activities across all clients it owns, making it one of the largest datasets in the organization.
This paper describes challenges and opportunities in applying information visualization to log analysis at this massive scale, and shows how various visualization techniques can be adapted to help data scientists extract insights.
In particular, we focus on two scenarios:\ (1) monitoring and exploring a large collection of log events, and (2) performing visual funnel analysis on log data with tens of thousands of event types.
Two interactive visualizations were developed for these purposes:
we discuss design choices and the implementation of these systems, along with case studies of how they are being used in day-to-day operations at Twitter.
Making Sense of Millions of Thoughts: Finding Patterns in the TweetsKrist Wongsuphasawat
I gave this presentation at Workshop on Interactive Language Learning, Visualization, and Interfaces / ACL 2014 in Baltimore, MD on June 27, 2014.
http://nlp.stanford.edu/events/illvi2014/index.html
ABSTRACT
Everyday on Twitter, there are millions of thoughts that are captured and shared to the world in the form of 140-character messages, or Tweets. There are many things we could learn from these thoughts if we could figure out a way to digest this gigantic dataset. Visualization is one of the many ways to extract information from these Tweets. In this presentation, I will talk about several visualizations based on Tweets, as well as share experiences and challenges from working with Tweet data.
A talk at Data Visualization Summit 2014 in Santa Clara, CA
ABSTRACT: What is the thought process that transforms data into visualizations? In this presentation, I will talk about guidelines that will help you when starting with raw data, walk through standard techniques, and also discuss things to keep in mind when making design decisions.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
1. Data Visualization Summit
San Francisco, CA
Apr 11, 2013
Visualizations for
Event Sequences Exploration
Krist Wongsuphasawat
Data Visualization Scientist
Twitter, Inc.
@kristw
22. Event
glyphs timeline
sequence
+ Interval width
+ Event
colors shapes
types
High
+
density
23. high density
time
Too many overlaps and occlusions
24. high density >> facet
Google Chrome
loading
scripting
rendering & painting
Facet
Google Chrome > Developer Tools > Timeline
25. high density >> facet
Lifelines
http://www.cs.umd.edu/lifelines
26. high density >> binning
British History Timeline
bin by year
27. high density >> aggregation
CloudLines
Raw event data
Kernel Density Estimation + Importance Func. + Truncation
Encode cloud size
28. high density >> aggregation
CloudLines (2)
Krstajic, M., Bertini, E., & Keim, D. A. (2011).
CloudLines: Compact Display of Event Episodes in Multiple Time-Series.
IEEE Transactions on Visualization and Computer Graphics, 17(12), 2432.
29. linear
Event
glyphs timeline
sequence
non-linear
+ Interval width
+ Event
colors shapes
types
High
+ facet aggregation binning
density
30. circular timeline
2008 2009 2010 2011 2012
linear
Dec Jan Feb
Nov Mar
circular Oct Apr
repeating patterns
Sep May
Aug Jun
Jul
31. circular timeline (2)
Traffic Incidents
VanDaniker, M. (2010). Leverage of Spiral Graph for Transportation System Data Visualization.
Transportation Research Record: Journal of the Transportation Research Board, 2165, 79–88.
33. stacked timeline (2)
Tweet Volume
Rios, M., & Lin, J. (2012). Distilling Massive Amounts of Data into Simple Visualizations : Twitter Case Studies.
Proceedings of the Workshop on Social Media Visualization (SocMedVis) at ICWSM 2012 (pp. 22–25).
34. linear
Event
glyphs timeline
sequence
non-linear
+ Interval width
+ Event
colors shapes
types
High
+ facet aggregation binning
density
53. aggregation by time
temporal summary
Wang, T. D., Plaisant, C., Shneiderman, B., Spring, N., Roseman, D., Marchand, G., Mukherjee, V., et al. (2009).
Temporal Summaries: Supporting Temporal Categorical Searching, Aggregation and Comparison.
IEEE Transactions on Visualization and Computer Graphics, 15(6), 1049–1056.
54. collection
1 2 n
Event Event ... Event
sequence sequence sequence
Interactions Aggregation
align
by
time
rank search by
sequence
filter group
55. aggregation by sequence
LifeFlow
e.g. 1) What happened to the patients after they arrived?
Arrival!
?
?
2) What happened to the patients before & after ICU?
ICU!
? ?
? ?
57. Demo
LifeFlow
Wongsuphasawat, K., Guerra Gómez, J. A., Plaisant, C., Wang, T. D., Taieb-Maimon, M., & Shneiderman, B. (2011).
LifeFlow: Visualizing an Overview of Event Sequences. Proceedings of CHI'2011 (pp. 1747–1756).
58. Demo
LifeFlow
Wongsuphasawat, K., Guerra Gómez, J. A., Plaisant, C., Wang, T. D., Taieb-Maimon, M., & Shneiderman, B. (2011).
LifeFlow: Visualizing an Overview of Event Sequences. Proceedings of CHI'2011 (pp. 1747–1756).
59. Demo
LifeFlow
Wongsuphasawat, K., Guerra Gómez, J. A., Plaisant, C., Wang, T. D., Taieb-Maimon, M., & Shneiderman, B. (2011).
LifeFlow: Visualizing an Overview of Event Sequences. Proceedings of CHI'2011 (pp. 1747–1756).
81. Past& Future&
Alignment%
Node’s horizontal position
shows sequence of states.%
e1!
e2!
e3!
End of path%
e1!
e1!
e2!
7me% link% e1!
Node’s height is
edge% edge% e2!
number of records.%
e4!
e2!
Color is outcome Time edge’s width is
measure.% duration of transition.%
82.
83. Wongsuphasawat, K., & Gotz, D. (2012).
Exploring Flow, Factors, and Outcomes of Temporal Event Sequences with the Outflow Visualization.
IEEE Transactions on Visualization and Computer Graphics, 18(12), 2659–2668.
84. collection
1 2 n
Event Event ... Event
sequence sequence sequence
Interactions Aggregation
align
by
time
rank search by
sequence
filter group
+ Outcome
88. eBay
Event Sequence Analysis at
alignment
Shen, Z., Wei, J., Sundaresan, N., & Ma, K.-L. (2012).
Visual analysis of massive web session data.
IEEE Symposium on Large Data Analysis and Visualization (LDAV), 65–72.
89. Event Sequence Analysis at
Twitter
• Data
– TBs of session logs everyday
• Complexity
– millions of sessions per day
– 1000+ types of events
– long sessions
• Goal
– Overview of how users are using Twitter
• Technique
– LifeFlow
Simplify!
90. Event Sequence Analysis at
Twitter (2)
• So far
– millions of sessions per day
– millions of sessions on the same screen
– 1000+ types of events
– simplified sets of events
• e.g., pages only, selected pages only
– long sessions
– limited session length to 10-20 events
92. Event Sequence Analysis at
Twitter (4)
• Implementation
– Hadoop
– Web-based (js)
• More
– Stored preprocessed data in smaller db
(MySQL/Vertica)
Interactive
MySQL /
HDFS Vertica Visualization
Batch pig scripts
93. Takeaway Messages
• Life is full of event sequences.
• How to visualize an event sequence
Krist Wongsuphasawat
krist.wongz@gmail.com
@kristw
94. linear
Event
glyphs timeline
sequence
non-linear
+ Interval width
+ Event
colors shapes
types
High
+ facet aggregation binning
density
95. Takeaway Messages
• Life is full of event sequences.
• How to visualize an event sequence
• How to visualize collection of event seq.
Krist Wongsuphasawat
krist.wongz@gmail.com
@kristw
96. collection
1 2 n
Event Event ... Event
sequence sequence sequence
Interactions Aggregation
align
by
time
rank search by
sequence
filter group
+ Outcome
97. Takeaway Messages
• Life is full of event sequences.
• How to visualize an event sequence
• How to visualize collection of event seq.
• Applicable to big data
• New techniques happen everyday.
Krist Wongsuphasawat
krist.wongz@gmail.com
@kristw
100. Takeaway Messages
• Life is full of event sequences.
• How to visualize an event sequence
• How to visualize collection of event seq.
• Applicable to big data
• New techniques happen everyday.
Krist Wongsuphasawat
krist.wongz@gmail.com
@kristw