Slides for talk given to London Atlassian User Group Jan 2017. How to get started with Python to extract data from Jira and produce charts for your Agile team.
UX Analytics for Data-driven Product DevelopmentTrieu Nguyen
- UX analytics can help companies turn their user data into real products by discovering user interests in real-time.
- Mobile analytics is important because mobile devices are becoming the dominant way users access the web, and big data and analytics are major trends.
- Core KPIs for mobile analytics include users, sessions, events, and other metrics to understand user behavior and how to engage app users.
Data analytic for mobile app developmentTrieu Nguyen
This document discusses using data analytics for mobile app development. It recommends analyzing user behavior and interests through metrics like users, sessions, and events to improve the user experience and inform business decisions. The document provides an example of a mobile advertising app that tracked user taps and social sharing to generate analytics and integrate with Facebook data. It advocates keeping analytics implementations simple while designing architectures that can handle large volumes of data.
Reactive Reatime Big Data with Open Source Lambda Architecture - TechCampVN 2014Trieu Nguyen
This document discusses using a reactive lambda architecture with open source tools to solve real-time big data problems. It begins by defining big data and explaining that simply having data is not enough - you need to solve the right problems with the right team and tools. It then presents three example problems that could benefit from real-time big data solutions: disaster prediction and response, understanding customers through social media data, and optimizing marketing campaigns in real-time. The document proposes using a reactive lambda architecture along with open source frameworks like Hadoop, Spark, Storm and databases like Redis, HDFS and HBase to build streaming data pipelines and query data in real-time. It demonstrates this through a social media user tracking and personalized recommendations use
Building your data driven business with Reactive Marketing TechnologyTrieu Nguyen
The document discusses data-driven business and reactive marketing technology. It begins with key questions about data-driven business, the benefits of analytics, and introduces the "9D" model for big data business. Tools for building reactive marketing technology are presented, including Apache Storm, Apache Kafka, Apache Spark, and the Hadoop ecosystem. A case study demonstrates how to build a digital marketing software using open source big data tools. The philosophy and a lightweight lambda architecture for building a reactive system is described.
This document introduces the Reactive Data System (RDS) framework called RFX for solving fast data problems reactively. It discusses how RFX was developed to handle common issues like counting pageviews, unique users, and real-time marketing. RFX is an open source, full stack framework that uses various tools like Kafka, Spark, and Redis to process high volumes of event data in real-time for applications like analytics, advertising, and monitoring. The document provides an example architecture and topology for collecting tracking data, processing it through RFX components, and generating reports.
The document discusses how to use analytics to drive user experience (UX) design. It covers analytics fundamentals like events, metrics, and user journeys. It also addresses challenges like relying on gut feelings over data and the need to retrospectively track events to validate user journeys. Additionally, it recommends tools to build an analytics infrastructure and emphasizes not being afraid to kill features if data shows they are not working. The overall goal is to make UX more data-driven.
From Data Analytics to Fast Data IntelligenceTrieu Nguyen
1) How to understand users with Data Analytics ?
2) How to build Real-time Music Recommender System from Data Stream ?
3) How to boost profit with Cross Sale in Real-time ?
Key Ideas to build Fast Data Intelligence Platform from Open Source Tools:
+ Apache Kafka
+ Apache Spark
+ RFX framework
UX Analytics for Data-driven Product DevelopmentTrieu Nguyen
- UX analytics can help companies turn their user data into real products by discovering user interests in real-time.
- Mobile analytics is important because mobile devices are becoming the dominant way users access the web, and big data and analytics are major trends.
- Core KPIs for mobile analytics include users, sessions, events, and other metrics to understand user behavior and how to engage app users.
Data analytic for mobile app developmentTrieu Nguyen
This document discusses using data analytics for mobile app development. It recommends analyzing user behavior and interests through metrics like users, sessions, and events to improve the user experience and inform business decisions. The document provides an example of a mobile advertising app that tracked user taps and social sharing to generate analytics and integrate with Facebook data. It advocates keeping analytics implementations simple while designing architectures that can handle large volumes of data.
Reactive Reatime Big Data with Open Source Lambda Architecture - TechCampVN 2014Trieu Nguyen
This document discusses using a reactive lambda architecture with open source tools to solve real-time big data problems. It begins by defining big data and explaining that simply having data is not enough - you need to solve the right problems with the right team and tools. It then presents three example problems that could benefit from real-time big data solutions: disaster prediction and response, understanding customers through social media data, and optimizing marketing campaigns in real-time. The document proposes using a reactive lambda architecture along with open source frameworks like Hadoop, Spark, Storm and databases like Redis, HDFS and HBase to build streaming data pipelines and query data in real-time. It demonstrates this through a social media user tracking and personalized recommendations use
Building your data driven business with Reactive Marketing TechnologyTrieu Nguyen
The document discusses data-driven business and reactive marketing technology. It begins with key questions about data-driven business, the benefits of analytics, and introduces the "9D" model for big data business. Tools for building reactive marketing technology are presented, including Apache Storm, Apache Kafka, Apache Spark, and the Hadoop ecosystem. A case study demonstrates how to build a digital marketing software using open source big data tools. The philosophy and a lightweight lambda architecture for building a reactive system is described.
This document introduces the Reactive Data System (RDS) framework called RFX for solving fast data problems reactively. It discusses how RFX was developed to handle common issues like counting pageviews, unique users, and real-time marketing. RFX is an open source, full stack framework that uses various tools like Kafka, Spark, and Redis to process high volumes of event data in real-time for applications like analytics, advertising, and monitoring. The document provides an example architecture and topology for collecting tracking data, processing it through RFX components, and generating reports.
The document discusses how to use analytics to drive user experience (UX) design. It covers analytics fundamentals like events, metrics, and user journeys. It also addresses challenges like relying on gut feelings over data and the need to retrospectively track events to validate user journeys. Additionally, it recommends tools to build an analytics infrastructure and emphasizes not being afraid to kill features if data shows they are not working. The overall goal is to make UX more data-driven.
From Data Analytics to Fast Data IntelligenceTrieu Nguyen
1) How to understand users with Data Analytics ?
2) How to build Real-time Music Recommender System from Data Stream ?
3) How to boost profit with Cross Sale in Real-time ?
Key Ideas to build Fast Data Intelligence Platform from Open Source Tools:
+ Apache Kafka
+ Apache Spark
+ RFX framework
Using User Behavior for Real-time AdvertisingTrieu Nguyen
1. The document discusses using user behavior data for real-time advertising to optimize revenue.
2. User behavior data can impact revenue by improving the probability users will click ads and the relevance of ads to users' interests.
3. The author proposes building a user behavior database with tools like Apache Spark, Apache Kafka, and Apache HBase to track user behavior and target ads in real-time.
Slide 3 Fast Data processing with kafka, rfx and redisTrieu Nguyen
1. The document discusses using the RFX (Reactive Function X) framework to solve problems with fast data processing.
2. RFX is a design pattern and collection of open source tools that can be used to quickly build data products and implement an agile data pipeline.
3. Examples of how RFX could be used for web analytics are presented, including counting pageviews and unique users in near real-time and detecting DDOS attacks.
RFX - Full-Stack Technology for Real-time Big DataTrieu Nguyen
RFX is a full-stack technology framework for real-time big data processing that was created in 2013 and is used by FPT for analytics tasks on websites like Vnexpress.net and eclick.vn. It is built from open source projects like Akka, Netty, Kafka, Spark, Redis and uses a reactive programming approach to optimize user experience through real-time data processing and business logic. RFX aims to provide a fast data intelligence platform for solving problems like analytics, user segmentation, and automatic optimization of user experiences.
In order to move past the hype and achieve the full potential of machine learning, data scientists and software developers need to work more closely together towards their common goal of delivering well-architected, data-driven applications. Every industry is in the process of being transformed by software and data. It is in the collaboration between data scientists and software developers where the real value can be found by creating integrated data workflows that benefit from the unique knowledge and skillsets of each discipline.
https://www.dncexpo.be/seminar/O105
This document discusses analyzing video data with GraphLab Create. It introduces Dato's products for ingesting, transforming, modeling and deploying machine learning models on unstructured data like images, text, graphs and tabular data. It then outlines a demo of using computer vision and face recognition techniques to match actors' faces from movie frames to subtitles and screenplay text. Instructions are provided for installing GraphLab Create and links shared for additional resources.
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
This document introduces Dato and its machine learning platform. Dato provides intuitive APIs and toolkits that allow developers to easily create intelligent applications for tasks like recommendation, sentiment analysis, churn prediction, and more. It offers scalable data structures, high performance algorithms, and the ability to quickly develop and deploy machine learning models and services. Customers across various industries have been able to build and operationalize intelligent solutions faster using Dato to solve problems in fraud detection, data matching, recommendations, and other domains.
Predicting Medical Test Results using Driverless AISri Ambati
1. poder.IO uses AI to predict customer behavior and personalize experiences. It deploys over 100 models daily using techniques like regression, classification, text analysis and deep learning.
2. Driverless AI is currently used to benchmark models before production and for research cases. It may be used starting Q3 2018 for advertising optimization, content classification, profile matching and look-alike modeling.
3. A joint team from poder.IO and Bayer developed models to predict individual medical test results using healthcare data, without direct lab measures. This could help improve treatment strategies. They used techniques like GLM, GBM, random forest and Driverless AI to develop and compare models for a medical test, finding Driver
The document discusses using machine learning to assess patient readmission risk and reduce avoidable hospital readmissions. It begins with an introduction of the speaker and an overview of the problem of high readmission rates. It then discusses current analytic approaches and their limitations, and how machine learning can leverage complex data sources like EMRs to provide more precise, real-time risk scoring and insights. The rest of the document focuses on demonstrating Dato's machine learning platform and capabilities for building such applications for predictive readmission risk at scale.
Lambda Architecture 2.0 for Reactive AB TestingTrieu Nguyen
1) What is data-driven business?
2) What and why is Lambda Architecture 2.0?
3) What problems did it solve for us?
4) Workshop with case study:
Building A/B testing tool for digital marketing with Lambda Architecture 2.0
Near realtime AI deployment with huge data and super low latency - Levi Brack...Sri Ambati
Published on Nov 2, 2018
This talk was recorded in London on October 30th, 2018 and can be viewed here: https://youtu.be/erHt-1yBuUw
Session: Travelport is a leading travel commerce platform that has truly huge data and many complex needs in terms of processing, performance and latency. This talk will demonstrate how we were able to harness big data technologies, H2O and cloud integration to deploy AI at scale and at low latency. The talk to cover practical advice taken from our AI journey; you will learn the successful strategies and the pitfalls of near real-time retraining ML models with streaming data and using all opensource technologies.
Bio: As principal data scientist at Travelport, Levi Brackman leads a team of data scientists that are putting ML model into production. Prior to Travelport, Levi spent most of his career in the start-up world. He founded and led an organization that created innovative educational software applications and solutions used by high schools and youth organizations in the USA and Australia. Levi earned a PhD in the quantitative social sciences under the supervision of one the world's leading educational psychologists. He earned master’s degree from University College London and is author of a business book published in eight languages that was a bestseller in multiple countries. A native of North London (UK) Levi is married and has five children and now lives in Broomfield, Colorado.
The More the Merrier: Scaling Model Building Infrastructure at ZendeskDatabricks
Significant amount of effort is required to transform a machine learning (ML) model into a useful machine learning product. The incorporation of ML into real world applications almost feels like "1% algorithm and 99% perspiration". I will share with you my team experience in building 3 ML products at Zendesk. I will also discuss some real-world problems and scaling complexities you may encounter when building these products at web scale. Close collaboration with different groups including product, engineering and data science is imperative to strike the balance between model performance, scalability and computational efficiency. The talk mainly focuses on scaling our model building infrastructure with an aim to build at least 50,000 models a day. This is achieved as part of our efforts to deliver a ML product called Content Cues. In a nutshell, Content Cues summarizes text from customers support tickets to form insightful topics. It combines multiple ML algorithms including deep learning, clustering and other natural language processing approaches. These ML algorithms are then run through tens of thousands of eligible Zendesk customer data every day. My talk will cover the following topics: How we implement a horizontally scalable model building and model serving pipeline by combining AWS EMR, AWS Batch and Kubernetes How we tune the model building pipeline to optimize cost and efficiency without compromising resiliency Challenges in model monitoring, model versioning evolution and capturing of user feedback
Speaker: Wai Chee Yau
As part of the IBM PartyCloud 2018 in Milan, the talk "A Journey into Data Science & AI" will present a case study about estimating Panelists Latent Affinities. I will show the components to develop an intelligent social agent able to classify entities and estimate latent affinities. The session will also cover good practices and common challenges faced by R&D organizations dealing with Machine Learning products.
Predicting Patient Outcomes in Real-Time at HCASri Ambati
Data Scientist Allison Baker and Development Manager of Data Products Cody Hall work with a talented team of data scientists, software engineers, and web developers, and are building the framework and infrastructure to support a real-time prediction application, with the ability to scale across the entire company. Paramount to these efforts has been the capability of integrating the architecture for software production with the predictive models generated by H2O. This talk will review the processes by which HCA is building a pipeline to predict patient outcomes in real-time, heavily relying on H2O’s POJO scoring API and implemented in Clojure data processing. #h2ony
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYCSri Ambati
This talk was recorded in NYC on October 22nd, 2019 and can be viewed here: https://youtu.be/aJJsrQHqsGg
AutoDoc with H2O Driverless AI
Driverless AI with Auto Doc is the next logical step of the data science workflow by taking the final step of automatically documenting and explaining the processes used by the platform. Auto Doc frees up the user from the time consuming task of documenting and summarizing their workflow while building machine learning models. The resulting documentation provides users with insight into machine learning workflow created by Driverless AI including details about the data used, the validation schema selected, model and feature tuning, and the final model created. With this capability in Driverless AI, users can focus on model insights and results.
Bio: Megan is a Customer Data Scientist at H2O. Prior to working at H2O, she worked as a Data Scientist building products driven by machine learning for B2B customers. She has experience working with customers across multiple industries, identifying common problems, and designing robust and automated solutions.
This document discusses how data science models have transitioned to the cloud to take advantage of greater computing resources. It notes that data science models are resource-intensive and traditionally required powerful local machines. The cloud allows data scientists to run models on cloud infrastructure for lower costs than high-end laptops and with access to many GPUs. Several major cloud platforms - Azure, AWS, and Google Cloud - are discussed and compared in terms of their machine learning offerings. The document also introduces Microsoft's Team Data Science Process, which aims to help data science teams collaborate more effectively on projects in the cloud.
Jeeves Grows Up: An AI Chatbot for Performance and QualityDatabricks
Sarah: CEO-Finance-Report pipeline seems to be slow today. Why
Jeeves: SparkSQL query dbt_fin_model in CEO-Finance-Report is running 53% slower on 2/28/2021. Data skew issue detected. Issue has not been seen in last 90 days.
Jeeves: Adding 5 more nodes to cluster recommended for CEO-Finance-Report to finish in its 99th percentile time of 5.2 hours.
Who is Jeeves? An experienced Spark developer? A seasoned administrator? No, Jeeves is a chatbot created to simplify data operations management for enterprise Spark clusters. This chatbot is powered by advanced AI algorithms and an intuitive conversational interface that together provide answers to get users in and out of problems quickly. Instead of being stuck to screens displaying logs and metrics, users can now have a more refreshing experience via a two-way conversation with their own personal Spark expert.
We presented Jeeves at Spark Summit 2019. In the two years since, Jeeves has grown up a lot. Jeeves can now learn continuously as telemetry information streams in from more and more applications, especially SQL queries. Jeeves now “knows” about data pipelines that have many components. Jeeves can also answer questions about data quality in addition to performance, cost, failures, and SLAs. For example:
Tom: I am not seeing any data for today in my Campaign Metrics Dashboard.
Jeeves: 3/5 validations failed on the cmp_kpis table on 2/28/2021. Run of pipeline cmp_incremental_daily failed on 2/28/2021.
This talk will give an overview of the newer capabilities of the chatbot, and how it now fits in a modern data stack with the emergence of new data roles like analytics engineers and machine learning engineers. You will learn how to build chatbots that tackle your complex data operations challenges.
H2O for Medicine and Intro to H2O in PythonSri Ambati
Erin LeDell presents on machine learning for medicine using the H2O platform. She discusses how electronic health records, genomic data, medical images, and data from wearables can be used with machine learning for applications like predictive diagnostics, prognosis, and remote patient monitoring. H2O is an open source machine learning platform that provides algorithms like deep learning, random forests, and gradient boosting in an easy to use interface. It demonstrates an EEG example to predict eye state from brain signals.
This document discusses best practices for developing data science products at Philip Morris International (PMI). It covers:
- PMI's data science team of over 40 people across four hubs working on fraud prevention and other problems.
- Key principles for PMI's data science work, including being business-driven, investing in people, self-organizing, iterating to improve, and co-creating solutions.
- Challenges in data product development involving integrating work between data scientists and other teams, and practices like continuous integration/delivery to overcome these challenges.
- The role of data scientists in contributing code that is readable, testable, reusable, reproducible, and usable by other teams to integrate into
Maruti Gollapudi has over 17 years of experience as a principal architect, specializing in digital customer experience. Some of his significant contributions include developing a data aggregation and analytics platform hosted on AWS that enables capabilities like social analytics, text analytics using NLP and machine learning, and enterprise search. He has experience building solutions leveraging technologies such as Java, JBoss, Kafka, MongoDB, Solr, Watson, and various analytics and social APIs. Recent projects include developing a headless CMS for page building and dynamic content modification for CNBC, and architecting a middleware for CNBC's integration with Uber to dynamically serve ride-related content.
Using User Behavior for Real-time AdvertisingTrieu Nguyen
1. The document discusses using user behavior data for real-time advertising to optimize revenue.
2. User behavior data can impact revenue by improving the probability users will click ads and the relevance of ads to users' interests.
3. The author proposes building a user behavior database with tools like Apache Spark, Apache Kafka, and Apache HBase to track user behavior and target ads in real-time.
Slide 3 Fast Data processing with kafka, rfx and redisTrieu Nguyen
1. The document discusses using the RFX (Reactive Function X) framework to solve problems with fast data processing.
2. RFX is a design pattern and collection of open source tools that can be used to quickly build data products and implement an agile data pipeline.
3. Examples of how RFX could be used for web analytics are presented, including counting pageviews and unique users in near real-time and detecting DDOS attacks.
RFX - Full-Stack Technology for Real-time Big DataTrieu Nguyen
RFX is a full-stack technology framework for real-time big data processing that was created in 2013 and is used by FPT for analytics tasks on websites like Vnexpress.net and eclick.vn. It is built from open source projects like Akka, Netty, Kafka, Spark, Redis and uses a reactive programming approach to optimize user experience through real-time data processing and business logic. RFX aims to provide a fast data intelligence platform for solving problems like analytics, user segmentation, and automatic optimization of user experiences.
In order to move past the hype and achieve the full potential of machine learning, data scientists and software developers need to work more closely together towards their common goal of delivering well-architected, data-driven applications. Every industry is in the process of being transformed by software and data. It is in the collaboration between data scientists and software developers where the real value can be found by creating integrated data workflows that benefit from the unique knowledge and skillsets of each discipline.
https://www.dncexpo.be/seminar/O105
This document discusses analyzing video data with GraphLab Create. It introduces Dato's products for ingesting, transforming, modeling and deploying machine learning models on unstructured data like images, text, graphs and tabular data. It then outlines a demo of using computer vision and face recognition techniques to match actors' faces from movie frames to subtitles and screenplay text. Instructions are provided for installing GraphLab Create and links shared for additional resources.
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
This document introduces Dato and its machine learning platform. Dato provides intuitive APIs and toolkits that allow developers to easily create intelligent applications for tasks like recommendation, sentiment analysis, churn prediction, and more. It offers scalable data structures, high performance algorithms, and the ability to quickly develop and deploy machine learning models and services. Customers across various industries have been able to build and operationalize intelligent solutions faster using Dato to solve problems in fraud detection, data matching, recommendations, and other domains.
Predicting Medical Test Results using Driverless AISri Ambati
1. poder.IO uses AI to predict customer behavior and personalize experiences. It deploys over 100 models daily using techniques like regression, classification, text analysis and deep learning.
2. Driverless AI is currently used to benchmark models before production and for research cases. It may be used starting Q3 2018 for advertising optimization, content classification, profile matching and look-alike modeling.
3. A joint team from poder.IO and Bayer developed models to predict individual medical test results using healthcare data, without direct lab measures. This could help improve treatment strategies. They used techniques like GLM, GBM, random forest and Driverless AI to develop and compare models for a medical test, finding Driver
The document discusses using machine learning to assess patient readmission risk and reduce avoidable hospital readmissions. It begins with an introduction of the speaker and an overview of the problem of high readmission rates. It then discusses current analytic approaches and their limitations, and how machine learning can leverage complex data sources like EMRs to provide more precise, real-time risk scoring and insights. The rest of the document focuses on demonstrating Dato's machine learning platform and capabilities for building such applications for predictive readmission risk at scale.
Lambda Architecture 2.0 for Reactive AB TestingTrieu Nguyen
1) What is data-driven business?
2) What and why is Lambda Architecture 2.0?
3) What problems did it solve for us?
4) Workshop with case study:
Building A/B testing tool for digital marketing with Lambda Architecture 2.0
Near realtime AI deployment with huge data and super low latency - Levi Brack...Sri Ambati
Published on Nov 2, 2018
This talk was recorded in London on October 30th, 2018 and can be viewed here: https://youtu.be/erHt-1yBuUw
Session: Travelport is a leading travel commerce platform that has truly huge data and many complex needs in terms of processing, performance and latency. This talk will demonstrate how we were able to harness big data technologies, H2O and cloud integration to deploy AI at scale and at low latency. The talk to cover practical advice taken from our AI journey; you will learn the successful strategies and the pitfalls of near real-time retraining ML models with streaming data and using all opensource technologies.
Bio: As principal data scientist at Travelport, Levi Brackman leads a team of data scientists that are putting ML model into production. Prior to Travelport, Levi spent most of his career in the start-up world. He founded and led an organization that created innovative educational software applications and solutions used by high schools and youth organizations in the USA and Australia. Levi earned a PhD in the quantitative social sciences under the supervision of one the world's leading educational psychologists. He earned master’s degree from University College London and is author of a business book published in eight languages that was a bestseller in multiple countries. A native of North London (UK) Levi is married and has five children and now lives in Broomfield, Colorado.
The More the Merrier: Scaling Model Building Infrastructure at ZendeskDatabricks
Significant amount of effort is required to transform a machine learning (ML) model into a useful machine learning product. The incorporation of ML into real world applications almost feels like "1% algorithm and 99% perspiration". I will share with you my team experience in building 3 ML products at Zendesk. I will also discuss some real-world problems and scaling complexities you may encounter when building these products at web scale. Close collaboration with different groups including product, engineering and data science is imperative to strike the balance between model performance, scalability and computational efficiency. The talk mainly focuses on scaling our model building infrastructure with an aim to build at least 50,000 models a day. This is achieved as part of our efforts to deliver a ML product called Content Cues. In a nutshell, Content Cues summarizes text from customers support tickets to form insightful topics. It combines multiple ML algorithms including deep learning, clustering and other natural language processing approaches. These ML algorithms are then run through tens of thousands of eligible Zendesk customer data every day. My talk will cover the following topics: How we implement a horizontally scalable model building and model serving pipeline by combining AWS EMR, AWS Batch and Kubernetes How we tune the model building pipeline to optimize cost and efficiency without compromising resiliency Challenges in model monitoring, model versioning evolution and capturing of user feedback
Speaker: Wai Chee Yau
As part of the IBM PartyCloud 2018 in Milan, the talk "A Journey into Data Science & AI" will present a case study about estimating Panelists Latent Affinities. I will show the components to develop an intelligent social agent able to classify entities and estimate latent affinities. The session will also cover good practices and common challenges faced by R&D organizations dealing with Machine Learning products.
Predicting Patient Outcomes in Real-Time at HCASri Ambati
Data Scientist Allison Baker and Development Manager of Data Products Cody Hall work with a talented team of data scientists, software engineers, and web developers, and are building the framework and infrastructure to support a real-time prediction application, with the ability to scale across the entire company. Paramount to these efforts has been the capability of integrating the architecture for software production with the predictive models generated by H2O. This talk will review the processes by which HCA is building a pipeline to predict patient outcomes in real-time, heavily relying on H2O’s POJO scoring API and implemented in Clojure data processing. #h2ony
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYCSri Ambati
This talk was recorded in NYC on October 22nd, 2019 and can be viewed here: https://youtu.be/aJJsrQHqsGg
AutoDoc with H2O Driverless AI
Driverless AI with Auto Doc is the next logical step of the data science workflow by taking the final step of automatically documenting and explaining the processes used by the platform. Auto Doc frees up the user from the time consuming task of documenting and summarizing their workflow while building machine learning models. The resulting documentation provides users with insight into machine learning workflow created by Driverless AI including details about the data used, the validation schema selected, model and feature tuning, and the final model created. With this capability in Driverless AI, users can focus on model insights and results.
Bio: Megan is a Customer Data Scientist at H2O. Prior to working at H2O, she worked as a Data Scientist building products driven by machine learning for B2B customers. She has experience working with customers across multiple industries, identifying common problems, and designing robust and automated solutions.
This document discusses how data science models have transitioned to the cloud to take advantage of greater computing resources. It notes that data science models are resource-intensive and traditionally required powerful local machines. The cloud allows data scientists to run models on cloud infrastructure for lower costs than high-end laptops and with access to many GPUs. Several major cloud platforms - Azure, AWS, and Google Cloud - are discussed and compared in terms of their machine learning offerings. The document also introduces Microsoft's Team Data Science Process, which aims to help data science teams collaborate more effectively on projects in the cloud.
Jeeves Grows Up: An AI Chatbot for Performance and QualityDatabricks
Sarah: CEO-Finance-Report pipeline seems to be slow today. Why
Jeeves: SparkSQL query dbt_fin_model in CEO-Finance-Report is running 53% slower on 2/28/2021. Data skew issue detected. Issue has not been seen in last 90 days.
Jeeves: Adding 5 more nodes to cluster recommended for CEO-Finance-Report to finish in its 99th percentile time of 5.2 hours.
Who is Jeeves? An experienced Spark developer? A seasoned administrator? No, Jeeves is a chatbot created to simplify data operations management for enterprise Spark clusters. This chatbot is powered by advanced AI algorithms and an intuitive conversational interface that together provide answers to get users in and out of problems quickly. Instead of being stuck to screens displaying logs and metrics, users can now have a more refreshing experience via a two-way conversation with their own personal Spark expert.
We presented Jeeves at Spark Summit 2019. In the two years since, Jeeves has grown up a lot. Jeeves can now learn continuously as telemetry information streams in from more and more applications, especially SQL queries. Jeeves now “knows” about data pipelines that have many components. Jeeves can also answer questions about data quality in addition to performance, cost, failures, and SLAs. For example:
Tom: I am not seeing any data for today in my Campaign Metrics Dashboard.
Jeeves: 3/5 validations failed on the cmp_kpis table on 2/28/2021. Run of pipeline cmp_incremental_daily failed on 2/28/2021.
This talk will give an overview of the newer capabilities of the chatbot, and how it now fits in a modern data stack with the emergence of new data roles like analytics engineers and machine learning engineers. You will learn how to build chatbots that tackle your complex data operations challenges.
H2O for Medicine and Intro to H2O in PythonSri Ambati
Erin LeDell presents on machine learning for medicine using the H2O platform. She discusses how electronic health records, genomic data, medical images, and data from wearables can be used with machine learning for applications like predictive diagnostics, prognosis, and remote patient monitoring. H2O is an open source machine learning platform that provides algorithms like deep learning, random forests, and gradient boosting in an easy to use interface. It demonstrates an EEG example to predict eye state from brain signals.
This document discusses best practices for developing data science products at Philip Morris International (PMI). It covers:
- PMI's data science team of over 40 people across four hubs working on fraud prevention and other problems.
- Key principles for PMI's data science work, including being business-driven, investing in people, self-organizing, iterating to improve, and co-creating solutions.
- Challenges in data product development involving integrating work between data scientists and other teams, and practices like continuous integration/delivery to overcome these challenges.
- The role of data scientists in contributing code that is readable, testable, reusable, reproducible, and usable by other teams to integrate into
Maruti Gollapudi has over 17 years of experience as a principal architect, specializing in digital customer experience. Some of his significant contributions include developing a data aggregation and analytics platform hosted on AWS that enables capabilities like social analytics, text analytics using NLP and machine learning, and enterprise search. He has experience building solutions leveraging technologies such as Java, JBoss, Kafka, MongoDB, Solr, Watson, and various analytics and social APIs. Recent projects include developing a headless CMS for page building and dynamic content modification for CNBC, and architecting a middleware for CNBC's integration with Uber to dynamically serve ride-related content.
Lambda Architecture and open source technology stack for real time big dataTrieu Nguyen
The document discusses the Lambda Architecture, which is an approach for building data systems to handle large volumes of real-time streaming data. It proposes using three main design principles: handling human errors by making the system fault-tolerant, storing raw immutable data, and enabling recomputation of results from the raw data. The document then provides two case studies of applying Lambda Architecture principles to analyze mobile app usage data and process high-volume web logs in real-time. It concludes with lessons learned, such as studying Lambda concepts, collecting any available data, and turning data into useful insights.
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTrivadis
This document provides an overview of artificial intelligence trends and applications in development and operations. It discusses how AI is being used for rapid prototyping, intelligent programming assistants, automatic error handling and code refactoring, and strategic decision making. Examples are given of AI tools from Microsoft, Facebook, and Codota. The document also discusses challenges like interpretability of neural networks and outlines a vision of "Software 2.0" where programs are generated automatically to satisfy goals. It emphasizes that AI will transform software development over the next 10 years.
Lambda architecture for real time big dataTrieu Nguyen
- The document discusses the Lambda Architecture, a system designed by Nathan Marz for building real-time big data applications. It is based on three principles: human fault-tolerance, data immutability, and recomputation.
- The document provides two case studies of applying Lambda Architecture - at Greengar Studios for API monitoring and statistics, and at eClick for real-time data analytics on streaming user event data.
- Key lessons discussed are keeping solutions simple, asking the right questions to enable deep analytics and profit, using reactive and functional approaches, and turning data into useful insights.
Agile Data Science is a lean methodology that is adopted from Agile Software Development. At the core it centers around people, interactions, and building minimally viable products to ship fast and often to solicit customer feedback. In this presentation, I describe how this work was done in the past with examples. Get started today with our help by visiting http://www.alpinenow.com
Analytic Excellence - Saying Goodbye to Old ConstraintsInside Analysis
The Briefing Room with Dr. Robin Bloor and Actian
Live Webcast August 6, 2013
http://www.insideanalysis.com
With all the innovations in compute power these days, one of the hardest hurdles to overcome is the tendency to think in old ways. By and large, the processing constraints of yesterday no longer apply. The new constraints revolve around the strategic management of data, and the effective use of business analytics. How can your organization take the helm in this new era of analysis?
Register for this episode of The Briefing Room to find out! Veteran Analyst Wayne Eckerson of The BI Leadership Forum, will explain how a handful of key innovations has significantly changed the game for data processing and analytics. He'll be briefed by John Santaferraro of Actian, who will tout his company's unique position in "scale-up and scale-out" for analyzing data.
Learn Data Science with Python course for B.TECH, BCA, MCA, BSC, MSC, B.COM, and statistical students. Data Science with python online training course with certified industry experts. Get a 100 % pre-placement guarantee.
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
2017 StrataHadoop SJC conference talk. https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56047
Description:
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #DataScienceHappiness.
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #datasciencehappiness.
Top Artificial Intelligence Tools & Frameworks in 2023.pdfYamuna5
Artificial intelligence has facilitated the processing and use of data in the business world. With the growth of AI and ML, data scientists and developers now have more AI tools and frameworks to work with. We believe it's important for machine learning platforms to be easy to use for business people who need results, but also powerful enough for technical teams who want to push the boundaries of data analysis with customizable extensions. The key to success is choosing the right AI framework or machine learning library.
This document provides an overview of a workshop on Google Cloud Platform presented by Javed Habib, GDSC Lead at IIT Bhilai. The workshop covers introduction to cloud computing and Google Cloud architecture, hands-on labs for Google Cloud storage options, APIs, Pub/Sub, security, big data analysis using Dataflow and BigQuery, machine learning with Vertex AI and AutoML, and networking and security on Google Cloud including VPCs, load balancing, and firewalls.
This document provides an introduction to machine learning concepts and tools. It begins with an overview of what will be covered in the course, including machine learning types, algorithms, applications, and mathematics. It then discusses data science concepts like feature engineering and the typical steps in a machine learning project, including collecting and examining data, fitting models, evaluating performance, and deploying models. Finally, it reviews common machine learning tools and terminologies and where to find datasets.
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...DataWorks Summit
Each of today’s most forward-thinking enterprises have been forced to face similar data challenges: the reliance on real-time data to better serve their customers and, subsequently, the requirement of complying with regulations to protect that data – one example being the General Data Protection Regulation (GDPR).
The solution to this emerging challenge is a tricky one – for companies like ING, this data governance challenge has been met with metadata, a consistent view across a large heterogeneous ecosystem and collaboration with an active open source community.
This joint presentation, John Mertic – director of program management for ODPi – and Ferd Scheepers – Global Chief Information Architect of ING – will address the benefits of a vendor-neutral approach to data governance, the need for an open metadata standard, along with insight around how companies ING, IBM, Hortonworks and more are delivering solutions to this challenge as an open source initiative.
Speakers
John Mertic, Director of Program Management for ODPi, R Consortium, and Open Mainframe Project, The Linux Foundation
Maryna Strelchuk, Information Architect, ING
Data Science With Python | Python For Data Science | Python Data Science Cour...Simplilearn
This Data Science with Python presentation will help you understand what is Data Science, basics of Python for data analysis, why learn Python, how to install Python, Python libraries for data analysis, exploratory analysis using Pandas, introduction to series and dataframe, loan prediction problem, data wrangling using Pandas, building a predictive model using Scikit-Learn and implementing logistic regression model using Python. The aim of this video is to provide a comprehensive knowledge to beginners who are new to Python for data analysis. This video provides a comprehensive overview of basic concepts that you need to learn to use Python for data analysis. Now, let us understand how Python is used in Data Science for data analysis.
This Data Science with Python presentation will cover the following topics:
1. What is Data Science?
2. Basics of Python for data analysis
- Why learn Python?
- How to install Python?
3. Python libraries for data analysis
4. Exploratory analysis using Pandas
- Introduction to series and dataframe
- Loan prediction problem
5. Data wrangling using Pandas
6. Building a predictive model using Scikit-learn
- Logistic regression
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you'll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques.
Learn more at: https://www.simplilearn.com
zData BI & Advanced Analytics Platform + 8 Week Pilot ProgramszData Inc.
This document describes zData's BI/Advanced Analytics Platform and Pilot Programs. The platform provides tools for storing, collaborating on, analyzing, and visualizing large amounts of data. It offers machine learning and predictive analytics. The platform can be deployed on-premise or in the cloud. zData also offers an 8-week pilot program that provides up to 1TB of data storage and full access to the platform's tools and services to test out the Big Data solution.
This talk given at the Hadoop Summit in San Jose on June 28, 2016, analyzes a few major trends in Big Data analytics.
These are a few takeaways from this talk:
- Adopt Apache Beam for easier development and portability between Big Data Execution Engines.
- Adopt stream analytics for faster time to insight, competitive advantages and operational efficiency.
- Accelerate your Big Data applications with In-Memory open source tools.
- Adopt Rapid Application Development of Big Data applications: APIs, Notebooks, GUIs, Microservices…
- Have Machine Learning part of your strategy or passively watch your industry completely transformed!
- How to advance your strategy for hybrid integration between cloud and on-premise deployments?
Should You Choose Java or Python for Data Science?Narola Infotech
We know the scientific libraries of Python. Time to see what libraries every Java web application development company needs to know!
Also known as Java Machine Learning. This library comes with the ability to develop calculative applications. These applications are capable of data processing, classification, and analysis. A Java development company will be able to master this library as machine learning is an expanding field.
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...InfluxData
IBM uses InfluxDB to store metrics collected from nmon and Grafana to visualize those metrics. This helps IBM monitor large production servers and benchmark centers. Some key points:
- nmon was originally created 25 years ago by Nigel Griffiths to monitor OS performance but the data format and lack of central storage was limiting. nmon was updated to output JSON and line protocol for InfluxDB.
- Grafana provides various visualizations of the metrics stored in InfluxDB like donut graphs, line graphs, heat maps, and single stat/graph panels. This helps identify issues like busy periods and system bottlenecks.
- Ideas were discussed for better visualizing periodic trends like busy Fridays or batch over
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...Safe Software
Stay ahead of the curve with our upcoming webinar on the latest developments in Generative AI technology. We will dive into the state of Generative AI since our previous webinar in January, including the newly released Azure Open AI tool, and explore its potential applications in the technology industry. Our expert speakers will showcase how this cutting-edge tool can be leveraged in FME data integration workflows for natural language processing, automated workflow generation, and predictive modeling. Our team will also demonstrate the incredible power and productivity of the new OpenAIChatGPTConnector, which leverages the state-of-the-art gpt-3.5-turbo model. Don't miss out on this opportunity to learn from the best in the field and discover how Generative AI can revolutionize your data integration workflows. Register now to unlock the power of Generative AI!
Similar to London atlassian meetup 31 jan 2016 jira metrics-extract slides (20)
WhatsApp offers simple, reliable, and private messaging and calling services for free worldwide. With end-to-end encryption, your personal messages and calls are secure, ensuring only you and the recipient can access them. Enjoy voice and video calls to stay connected with loved ones or colleagues. Express yourself using stickers, GIFs, or by sharing moments on Status. WhatsApp Business enables global customer outreach, facilitating sales growth and relationship building through showcasing products and services. Stay connected effortlessly with group chats for planning outings with friends or staying updated on family conversations.
DDS Security Version 1.2 was adopted in 2024. This revision strengthens support for long runnings systems adding new cryptographic algorithms, certificate revocation, and hardness against DoS attacks.
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
London atlassian meetup 31 jan 2016 jira metrics-extract slides
1. Jira + Python + Data Analysis
tools + Agile concepts
=
Powerful Actionable Insights
Atlassian London User Group
By Rudiger Wolf
31 Jan 2016
2. Objective
You will be able to make better use of the data captured in your Jira instance.
Get at the data.
Process the data.
Present the data in an insightful way
3. Agenda
Who is the speaker?
Context
Python a great all purpose tool
Actionable Agile
Jira-metrics-extract (Python Package)
The End
4. Who am I?
Studied Electrical & Electronics Engineering
Started Agile journey in 1998 with DSDM
Independent Consultant
Work with small and large organisations (Private and Public sector)
Couple of startups in Dot Com era
Most recently working at MOJ via Agilesphere
5. What's the first first question you get from your
customer about your release?
6. Excel to the rescue?
How many of you export issue data to excel and then do further processing?
How many of those who export data automate the further processing?
How many of you use Visual Basic for Applications?
Anyone using office-js yet?
Automate to:
speed-up and
make logic explicit
7. Consider learning Python
Python is a popular programming language that can help to work with your data.
Why Python?
Ease of Learning
It Runs on Any Platform
Great General-Purpose Language
Constantly Improving
Fantastic community
Extensive standard library
Thousands of 3rd party libraries
9. How to get started?
1 - Install Python [+ Git + text editor(Sublime,...) ]
2 - Create “Virtual Environment”
3 - Install Libraries/Packages/Modules
4 - Create your program
10.
11. Libraries/Packages/Modules
https://pypi.org/project/jira/
This library eases the use of the JIRA REST API from Python and it has been used in production for
years. As this is an open-source project that is community maintained.
https://pypi.org/project/pandas/
Designed to make working with structured tabular and time series data both easy and intuitive. Goal of
becoming the most powerful and flexible open source data analysis / manipulation tool available in any
language.
https://pypi.org/project/jira-metrics-extract/
Utility helps extract data from JIRA for processing with the ActionableAgile Analytics tool (
https://www.actionableagile.com/analytics-tools/) OR produce a version of the charts locally.
12. The Jupyter Notebook
Is a web application that allows you to create and share documents that contain
live code, equations, visualizations and explanatory text. Uses include: data
cleaning and transformation, numerical simulation, statistical modeling, machine
learning and much more.
http://jupyter.org/
Jupyter + jira-metrics-extract + a number of libraries = Fantastic interactive
environment to create analysis “recipes”
https://www.oreilly.com/ideas/the-state-of-jupyter
Http://try.jupyter.org https://notebooks.azure.com
13. Video Zero to Hero Jira Access
Run video.
How does it take to get started extracting data from Jira?
Zero To Hero With Python and Jira 2min
https://www.youtube.com/watch?v=s4hVmiLR0jo
14. Actionable Agile
Just get the book, watch the conference
Videos…
https://leanpub.com/actionableagilemetrics
Actionable Agile Metrics for Predictability is a
comprehensive guide on how to use flow metrics
and analytics to get the predictability your
customers crave.
By Daniel Vacanti
15. Other thought leaders in Probabilistic Planning
Troy Magennis – Agile Probabilistic forecasting quant @t_magennis
http://focusedobjective.com/
Larry Maccherone @LMaccherone
“I got this concept that every decision is a forecast and the reason I say that is that by picking
alternative a for instance, you are going to forecast that alternative a has a better outcome for you
than alternatives b, c, d, and e.”
Larry Maccherone -
Director of Analytics at AgileCraft, Prior to that, Larry led the Portfolio Insights
product line at Rally Software.
https://www.infoq.com/articles/maccherone-agilecraft-data
16. Jira-metrics-extract
Install
Configure & Run
Data Extract file
Charts
Cumulative Flow Diagram
Release Burnup with forecast (monte carlo - based on past team performance)
Enhance with Jupyter notebook recipe
Enrich Sprint number, Pivot Table
Fill in missing sprint numbers, Extract dependencies
Visualise the dependencies in clickable dependency map
17. Demo Video
Configure and extract data 3min
https://youtu.be/HSk_aBQmvi0
Further Demos can show you how to:
* Enrich data extract
* Create Pivot Table
* Visualise Dependencies
18. The End
Knowledge to create an interactive environment
So that you can extract data and iterate toward analysis “recipes”
Quickly
Repeatability
Questions
Twitter : @rnwolf