How to start becoming data-driven. Where to look for quick wins, how to gather data, how to set up process and methodology, what to strive for and what mistakes to guard against.
Statistical analysis can provide managers with useful insights if done properly. It allows companies to understand market trends from representative consumer data, avoiding reliance on assumptions. Statistics also enable quality control by measuring production processes to minimize variations and ensure consistency, reducing waste and warranty costs. However, statistics must be interpreted carefully as results can be influenced by flawed data or misused to push certain conclusions. The full context must be considered and statistical significance does not necessarily equal practical importance.
This document provides an overview of various quantitative forecasting techniques, including moving averages, trend analysis, exponential smoothing, ARIMA models, and econometric models. It describes when each technique is best used, their advantages and disadvantages, and provides examples. The techniques range from simple methods like moving averages to more complex approaches like ARIMA and econometric models, with the key being choosing the right technique based on the characteristics of the data and forecasting needs.
MIP: Analysis of metadata and data revisionsDario Buono
This document summarizes an analysis of metadata and data revisions for the Macroeconomic Imbalance Procedure (MIP) scoreboard indicators. It discusses the results of an expert opinion poll on the likelihood and impact of revisions to MIP indicators. It also analyzes revisions to MIP scoreboard data from 2007-2011 using various statistical measures to identify indicators that experience large revisions that could provide new information versus small revisions that may just represent noise. Key findings include interesting results for several indicators like deflated house prices, financial accounts, and balance of payments. The document advocates further refinement of the expert opinion methodology and performing revision analysis using more statistics, additional data vintages, and higher frequency data.
This document discusses the role of statistics in business decision making. It describes descriptive statistics, which presents data in a way that is easier to understand through charts and graphs. Descriptive statistics measures central tendency and the spread of data using metrics like mean, median, mode, range, and standard deviation. The document also covers inferential statistics, which analyzes data samples to estimate parameters and test hypotheses. Examples are given of how statistics are used in various business contexts like Wall Street analysis and clothing design to draw conclusions from raw data and inform future decisions.
The document provides guidance on effectively presenting data through good visualization techniques. It discusses choosing the appropriate type of chart, graph or table based on the data, message and audience. Key principles include keeping visualizations simple, including only necessary details, and using labels and captions to guide the reader's understanding of the data. The goal is to help the reader understand the overall message or story the data is trying to convey.
These are the slides used for a 2 hours course on data visualisation. The course was addressed to biologists, hence most examples come from scientific publications in this area (but not only)
This document discusses effective data visualization techniques. It notes that data can be encoded using visual cues like size, shape and color. Experiments have shown that some cues, like position on an axis, are better perceived by humans than others for conveying quantitative information. Effective charts depend on the question being answered and may use bars, lines, areas or color to show comparisons, trends, compositions or identities.
Statistical analysis can provide managers with useful insights if done properly. It allows companies to understand market trends from representative consumer data, avoiding reliance on assumptions. Statistics also enable quality control by measuring production processes to minimize variations and ensure consistency, reducing waste and warranty costs. However, statistics must be interpreted carefully as results can be influenced by flawed data or misused to push certain conclusions. The full context must be considered and statistical significance does not necessarily equal practical importance.
This document provides an overview of various quantitative forecasting techniques, including moving averages, trend analysis, exponential smoothing, ARIMA models, and econometric models. It describes when each technique is best used, their advantages and disadvantages, and provides examples. The techniques range from simple methods like moving averages to more complex approaches like ARIMA and econometric models, with the key being choosing the right technique based on the characteristics of the data and forecasting needs.
MIP: Analysis of metadata and data revisionsDario Buono
This document summarizes an analysis of metadata and data revisions for the Macroeconomic Imbalance Procedure (MIP) scoreboard indicators. It discusses the results of an expert opinion poll on the likelihood and impact of revisions to MIP indicators. It also analyzes revisions to MIP scoreboard data from 2007-2011 using various statistical measures to identify indicators that experience large revisions that could provide new information versus small revisions that may just represent noise. Key findings include interesting results for several indicators like deflated house prices, financial accounts, and balance of payments. The document advocates further refinement of the expert opinion methodology and performing revision analysis using more statistics, additional data vintages, and higher frequency data.
This document discusses the role of statistics in business decision making. It describes descriptive statistics, which presents data in a way that is easier to understand through charts and graphs. Descriptive statistics measures central tendency and the spread of data using metrics like mean, median, mode, range, and standard deviation. The document also covers inferential statistics, which analyzes data samples to estimate parameters and test hypotheses. Examples are given of how statistics are used in various business contexts like Wall Street analysis and clothing design to draw conclusions from raw data and inform future decisions.
The document provides guidance on effectively presenting data through good visualization techniques. It discusses choosing the appropriate type of chart, graph or table based on the data, message and audience. Key principles include keeping visualizations simple, including only necessary details, and using labels and captions to guide the reader's understanding of the data. The goal is to help the reader understand the overall message or story the data is trying to convey.
These are the slides used for a 2 hours course on data visualisation. The course was addressed to biologists, hence most examples come from scientific publications in this area (but not only)
This document discusses effective data visualization techniques. It notes that data can be encoded using visual cues like size, shape and color. Experiments have shown that some cues, like position on an axis, are better perceived by humans than others for conveying quantitative information. Effective charts depend on the question being answered and may use bars, lines, areas or color to show comparisons, trends, compositions or identities.
Segmentacija je ključna za učinkovito nagovarjanje in konvertiranje potancialnih strank. Simon Belak, vodja analitike pri GoOptiju in transmedijski urednik pri kritičnem časopisu Tribuna, je razkril, kako odkrivati segmente iz podatkov.
Po njegovih besedah je povsem neupravičeno, da je segmentacija povečini statična in narejena na slepo, neupoštevajoč podatke. V predavanju je predstavil aletrnativo: analitično delno avtomatično odkrivanje segmentov iz podatkov.
Na konkretnih primerih je pokazal, kako preslikati podatke o interakcijah s strankami (obisk strani kot pokazatelji interesov, odgovori na ankete, vzorci premikanja po straneh, odpiranje emailov…) v model strank in nadaljeval z razdelitvijo v segmente. Simon je za konec izpostavil najpogostejše pasti in drobne trike za primere, ko imamo malo podatkov, ali so le-ti nejasni.
We instrumented 15k LOC codebase with spec so you don't have to (but probably should). Validation; testing; destructuring; composable "data macros" via conformers; we've tried spec in all its multifaceted glory. This talk is a distillation of lessons learned interspersed with musing on how spec alters development flow and one's thinking.
Presented at EuroClojure 2016
The time is out of joint: O cursed spite, / That ever I was born to set it ri...Simon Belak
The document discusses concepts related to time, concurrency, and functional programming. It explores ideas such as multiple intersecting timelines representing concurrency, action occurring sequentially while perception is parallel, and functional reactive programming involving inversion of control. Various concurrency models like shared-isolated, asynchronous, synchronous, coordinated, and autonomous are also mentioned.
This document discusses doing data science with Clojure. It notes that Clojure excels at structure manipulation and encoding through functions over collections without rigid data structures. This allows for composable and fast data analysis in a way that focuses on the intent through consistent APIs and currying. Live programming is also discussed as a way to catch errors early and enable faster iteration through more context and easier debugging. The ecosystem of Clojure tools is presented as facilitating tasks like machine learning, plotting, and using notebooks as dashboards.
Whenever a programming language comes out with a new feature, us smug lisp weenies shrug and point out how lisp had that in the early seventies; and if you look at the list of influences of a given language, there is bound to be a lisp in there. In this talk I will try to unpack what makes lisp special, why it is called programming programming language , how it changes one’s thinking, and how that thinking can be applied elsewhere.
Having programmers do data science is terrible, if only everyone else were not even worse. The problem is of course tools. We seem to have settled on either: a bunch of disparate libraries thrown into a more or less agnostic IDE, or some point-and-click wonder which no matter how glossy, never seems to trully fit our domain once we get down to it. The dual lisp tradition of grow-your-own-language and grow-your-own-editor gives me hope there is a third way. This presentation is a meditation on how I approach data problems with Clojure, what I believe the process of doing data science should look like and the tools needed to get there. Some already exists (or can at least be bodged together); others can be made with relative ease (and we are already working on some of these); but a few will take a lot more hammock time.
Talk delivered at :clojureD 2016 http://www.clojured.de/
Having programmers do data science is terrible, if only everyone else were not even worse. The problem is of course tools. We seem to have settled on either: a bunch of disparate libraries thrown into a more or less agnostic IDE, or some point-and-click wonder which no matter how glossy, never seems to truly fit our domain once we get down to it. The dual lisp tradition of grow-your-own-language and grow-your-own-editor gives me hope there is a third way.
This talk is a meditation on the ideal environment for doing data science and how to (almost) get there. I will cover how I approach data problems with Clojure (and why Clojure in the first place), what I believe the process of doing data science should look like and the tools needed to get there. Some already exists (or can at least be bodged together); others can be made with relative ease (and we are already working on some of these); but a few will take a lot more hammock time.
Successfully forecasting future demand is key in allowing GoOpti its low prices while isolating transport partners from risk. It this talk Simon Belak, Chief Data scientist at GoOpti, will take you through how he approaches forecasting and the lessons that he learned along the way. The focus is going to be on models that do not require excessive amounts of data, are legible and work well as part of a continuous process (rather than being a one-of problem).
Clojure has always been good at manipulating data. With the release of spec and Onyx (“a masterless, cloud scale, fault tolerant, high performance distributed computation system”) good became best. In this talk you will learn about a data layer architecture build around Kafka and Onyx that is self-describing, declarative, scalable and convenient to work with for the end user. The focus will be on the power and elegance of describing data and computation with data; and the inferences and automations that can be built on top of that.
This document discusses becoming a data-driven organization. It recommends investing in robust data extraction and loading processes. Quick wins should be obvious improvements that were overlooked previously. Analyses should minimize friction and experimentation is important. Measurements should be focused on distributions rather than single numbers, capturing external relationships. The checklist involves investing in data, finding quick wins, being methodical, focusing on one key performance indicator at a time while changing often, and avoiding stale metrics.
In this talk, you will discover how the 15k LOC codebase was implemented with spec so you don't have to (but probably should). Validation; testing; destructuring; composable “data macros” via conformers; we’ve tried spec in all its multifaceted glory. You will discover a distillation of lessons learned interspersed with musing on how spec alters development flow and one’s thinking.
@sbelak
Simon Belak
Using Onyx in anger
Clojure has always been good at manipulating data. With the release of spec and Onyx ("masterless, cloud scale, fault tolerant, high performance distributed computation system") good became best. In this talk I will walk you through a data layer architecture build around Kafka an Onyx that is self-describing, declarative, scalable and convenient to work with for the end user. The focus will be on the power and elegance of describing data and computation with data; and the inferences and automations that can be built on top of that.
This document discusses architecting large systems and provides examples from the presenter's experience. Some key points:
- Requirements for large IT projects are often wrong, with users asking for new computer systems to solve problems that may be better addressed through process changes. Analysts are needed to understand true needs and ensure feasibility.
- Solution selection is frequently based on subjective preferences rather than objective evaluation. Justification often follows the decision rather than informing it. Buy vs build decisions should consider simplicity, scope, and ongoing costs.
- Successful implementation requires focusing on people - having the right project managers, consultants, customers and internal staff - rather than technologies. It is important to prioritize and know when to stop expanding
Using Web Data to Drive Revenue and Reduce CostsConnotate
This presentation is designed to help companies strengthen their competitive advantage by leveraging publicly available Web sources.
Entrepreneurs, global industry leaders and enterprises of all sizes are turning Web data into lucrative opportunities – creating new revenue-generating products, reducing costs and re-engineering workflows to optimize pricing, streamline reporting, ensure compliance, engage interactively with clients and more.
This presentation uses a variety of success stories to illustrate ways in which businesses can use Web data to drive revenue and streamline operations.
Data Refinement: The missing link between data collection and decisionsVivastream
The document discusses the importance of data refinement between data collection and decision making. It emphasizes the need to transform raw data into useful insights through techniques like data summarization, categorization, and predictive modeling in order to provide accurate marketing answers and improve targeting, costs, and results. Specifically, it recommends structuring data into a model-ready environment, creating descriptive variables from transaction histories, matching data to the appropriate analytical goals and levels, and categorizing non-numeric attributes.
This document discusses strategies for estimating software development project delivery. It will cover traditional and Agile techniques for estimation, including examining the purpose of estimates, differences between estimates and guarantees, and how estimation works in Scrum and Kanban environments. Attendees will learn about estimation strategies as a project manager or developer working with business partners.
The future for performance management, quality and true continuous improvement for local council planning services. Uses much of the data that councils already send to government, supplements it with some new approaches to customer and quality feedback, and brings it all together in one tidy, holistic report.
Segmentacija je ključna za učinkovito nagovarjanje in konvertiranje potancialnih strank. Simon Belak, vodja analitike pri GoOptiju in transmedijski urednik pri kritičnem časopisu Tribuna, je razkril, kako odkrivati segmente iz podatkov.
Po njegovih besedah je povsem neupravičeno, da je segmentacija povečini statična in narejena na slepo, neupoštevajoč podatke. V predavanju je predstavil aletrnativo: analitično delno avtomatično odkrivanje segmentov iz podatkov.
Na konkretnih primerih je pokazal, kako preslikati podatke o interakcijah s strankami (obisk strani kot pokazatelji interesov, odgovori na ankete, vzorci premikanja po straneh, odpiranje emailov…) v model strank in nadaljeval z razdelitvijo v segmente. Simon je za konec izpostavil najpogostejše pasti in drobne trike za primere, ko imamo malo podatkov, ali so le-ti nejasni.
We instrumented 15k LOC codebase with spec so you don't have to (but probably should). Validation; testing; destructuring; composable "data macros" via conformers; we've tried spec in all its multifaceted glory. This talk is a distillation of lessons learned interspersed with musing on how spec alters development flow and one's thinking.
Presented at EuroClojure 2016
The time is out of joint: O cursed spite, / That ever I was born to set it ri...Simon Belak
The document discusses concepts related to time, concurrency, and functional programming. It explores ideas such as multiple intersecting timelines representing concurrency, action occurring sequentially while perception is parallel, and functional reactive programming involving inversion of control. Various concurrency models like shared-isolated, asynchronous, synchronous, coordinated, and autonomous are also mentioned.
This document discusses doing data science with Clojure. It notes that Clojure excels at structure manipulation and encoding through functions over collections without rigid data structures. This allows for composable and fast data analysis in a way that focuses on the intent through consistent APIs and currying. Live programming is also discussed as a way to catch errors early and enable faster iteration through more context and easier debugging. The ecosystem of Clojure tools is presented as facilitating tasks like machine learning, plotting, and using notebooks as dashboards.
Whenever a programming language comes out with a new feature, us smug lisp weenies shrug and point out how lisp had that in the early seventies; and if you look at the list of influences of a given language, there is bound to be a lisp in there. In this talk I will try to unpack what makes lisp special, why it is called programming programming language , how it changes one’s thinking, and how that thinking can be applied elsewhere.
Having programmers do data science is terrible, if only everyone else were not even worse. The problem is of course tools. We seem to have settled on either: a bunch of disparate libraries thrown into a more or less agnostic IDE, or some point-and-click wonder which no matter how glossy, never seems to trully fit our domain once we get down to it. The dual lisp tradition of grow-your-own-language and grow-your-own-editor gives me hope there is a third way. This presentation is a meditation on how I approach data problems with Clojure, what I believe the process of doing data science should look like and the tools needed to get there. Some already exists (or can at least be bodged together); others can be made with relative ease (and we are already working on some of these); but a few will take a lot more hammock time.
Talk delivered at :clojureD 2016 http://www.clojured.de/
Having programmers do data science is terrible, if only everyone else were not even worse. The problem is of course tools. We seem to have settled on either: a bunch of disparate libraries thrown into a more or less agnostic IDE, or some point-and-click wonder which no matter how glossy, never seems to truly fit our domain once we get down to it. The dual lisp tradition of grow-your-own-language and grow-your-own-editor gives me hope there is a third way.
This talk is a meditation on the ideal environment for doing data science and how to (almost) get there. I will cover how I approach data problems with Clojure (and why Clojure in the first place), what I believe the process of doing data science should look like and the tools needed to get there. Some already exists (or can at least be bodged together); others can be made with relative ease (and we are already working on some of these); but a few will take a lot more hammock time.
Successfully forecasting future demand is key in allowing GoOpti its low prices while isolating transport partners from risk. It this talk Simon Belak, Chief Data scientist at GoOpti, will take you through how he approaches forecasting and the lessons that he learned along the way. The focus is going to be on models that do not require excessive amounts of data, are legible and work well as part of a continuous process (rather than being a one-of problem).
Clojure has always been good at manipulating data. With the release of spec and Onyx (“a masterless, cloud scale, fault tolerant, high performance distributed computation system”) good became best. In this talk you will learn about a data layer architecture build around Kafka and Onyx that is self-describing, declarative, scalable and convenient to work with for the end user. The focus will be on the power and elegance of describing data and computation with data; and the inferences and automations that can be built on top of that.
This document discusses becoming a data-driven organization. It recommends investing in robust data extraction and loading processes. Quick wins should be obvious improvements that were overlooked previously. Analyses should minimize friction and experimentation is important. Measurements should be focused on distributions rather than single numbers, capturing external relationships. The checklist involves investing in data, finding quick wins, being methodical, focusing on one key performance indicator at a time while changing often, and avoiding stale metrics.
In this talk, you will discover how the 15k LOC codebase was implemented with spec so you don't have to (but probably should). Validation; testing; destructuring; composable “data macros” via conformers; we’ve tried spec in all its multifaceted glory. You will discover a distillation of lessons learned interspersed with musing on how spec alters development flow and one’s thinking.
@sbelak
Simon Belak
Using Onyx in anger
Clojure has always been good at manipulating data. With the release of spec and Onyx ("masterless, cloud scale, fault tolerant, high performance distributed computation system") good became best. In this talk I will walk you through a data layer architecture build around Kafka an Onyx that is self-describing, declarative, scalable and convenient to work with for the end user. The focus will be on the power and elegance of describing data and computation with data; and the inferences and automations that can be built on top of that.
This document discusses architecting large systems and provides examples from the presenter's experience. Some key points:
- Requirements for large IT projects are often wrong, with users asking for new computer systems to solve problems that may be better addressed through process changes. Analysts are needed to understand true needs and ensure feasibility.
- Solution selection is frequently based on subjective preferences rather than objective evaluation. Justification often follows the decision rather than informing it. Buy vs build decisions should consider simplicity, scope, and ongoing costs.
- Successful implementation requires focusing on people - having the right project managers, consultants, customers and internal staff - rather than technologies. It is important to prioritize and know when to stop expanding
Using Web Data to Drive Revenue and Reduce CostsConnotate
This presentation is designed to help companies strengthen their competitive advantage by leveraging publicly available Web sources.
Entrepreneurs, global industry leaders and enterprises of all sizes are turning Web data into lucrative opportunities – creating new revenue-generating products, reducing costs and re-engineering workflows to optimize pricing, streamline reporting, ensure compliance, engage interactively with clients and more.
This presentation uses a variety of success stories to illustrate ways in which businesses can use Web data to drive revenue and streamline operations.
Data Refinement: The missing link between data collection and decisionsVivastream
The document discusses the importance of data refinement between data collection and decision making. It emphasizes the need to transform raw data into useful insights through techniques like data summarization, categorization, and predictive modeling in order to provide accurate marketing answers and improve targeting, costs, and results. Specifically, it recommends structuring data into a model-ready environment, creating descriptive variables from transaction histories, matching data to the appropriate analytical goals and levels, and categorizing non-numeric attributes.
This document discusses strategies for estimating software development project delivery. It will cover traditional and Agile techniques for estimation, including examining the purpose of estimates, differences between estimates and guarantees, and how estimation works in Scrum and Kanban environments. Attendees will learn about estimation strategies as a project manager or developer working with business partners.
The future for performance management, quality and true continuous improvement for local council planning services. Uses much of the data that councils already send to government, supplements it with some new approaches to customer and quality feedback, and brings it all together in one tidy, holistic report.
06/18/2014 - Billing & Payments Engineering Meetup @ Netflix
For this Meetup, we have invited speakers from several tech companies to give a series of lightening talks on challenges related to billing & payments systems.
This event is for engineers who are interested in learning more about billing & payments systems. No previous experience with this kind of system is required to attend.
Presenters:
- Mathieu Chauvin - Engineering Manager for Payments @ Netflix
- Taylor Wicksell - Sr. Software Engineer for Billing @ Netflix
- Jean-Denis Greze - Engineer @ Dropbox
- Alec Holmes - Software Engineer @ Square
- Emmanuel Cron - Software Engineer III, Google Wallet @ Google
- Paul Huang - Engineering Manager @ Survey Monkey
- Anthony Zacharakis - Lead Engineer @ Lumos Labs
- Shengyong Li / Feifeng Yang - Dir. Engineering Commerce / Tech Lead Payment @ Electronic Arts
Find it! Nail it!Boosting e-commerce search conversions with machine learnin...Rakuten Group, Inc.
The document discusses learning-to-rank models for improving search relevance in e-commerce. It describes how traditional information retrieval models do not scale well to modern needs, while learning-to-rank methods can handle thousands of features and implicit user feedback data. The document reports that using listwise learning-to-rank with NDCG as the loss function improved NDCG by 15.6% and increased conversion rates by 7.5% on e-commerce data. It concludes that deep neural network methods may now outperform traditional machine learning for information retrieval tasks.
Return to Basics: Supply Chain Re-design ..'Isc' turkey 2015Walaa Maher
What is Supply Chain/Network redesign basics .. ... Not saying it is the best or working 100%.. just it worked for me every time in every company... and themed with Frank Sinatra :)
How to Leverage Analytics, Design, and Development to Transform Customer Jour...Qualtrics
Learn how McKinsey harnesses big data in experience. This session outlines the concept of leveraging big data, how to prioritize and map journeys, the benefits of setting up cross channel groups, and how to carve out implementation capacity.
This document discusses the concept of technical debt in software development. It defines technical debt as the implied cost of additional rework caused by choosing an easy short-term solution rather than a better long-term approach. It then provides examples of common causes of technical debt, such as business pressure and lack of testing. Finally, it discusses strategies for managing technical debt, such as prioritizing and reducing existing debt, using debt ceilings to avoid accumulating too much debt, and focusing on recent code changes when reducing debt.
We explain the history of our agile organization with a focus on the latest round of evolution of our Product and Engineering organization, moving from business-oriented feature teams to mission teams.
Delivering Aha Moments through Procurement Performance Analytics (part II)Dan Traub
2016 NAEP Annual Meeting presentation by Dan Traub of FinVantage Solutions LLC and Sandy Hicks of the University of Colorado. Delivered in San Antonio, TX.
In this update to the 2015 presentation, we explore the rollout and impact of the departmental scorecard program as it was introduced to over 215 financial managers across the University of Colorado System.
Learn how the University of Colorado Procurement Service Center launched an innovative program to gain valuable insights into its performance using the latest data visualization tools. Utilizing and analyzing data from multiple sources, CU has built a 360-degree platform to explore outcomes from procurement, payables, travel management, payment card, customer outreach, and other functions. The transition to a data-driven mindset will be examined, including a few surprises they learned along the way. The team will demonstrate how these metrics are driving new levels of collaboration with campus stakeholders, thanks to the popular departmental scorecards and executive dashboards.
Extending Data Lake using the Lambda Architecture June 2015DataWorks Summit
The document discusses using a Lambda architecture to extend a data lake with real-time capabilities. It describes considerations for choosing a real-time architecture and common use cases. Specific examples discussed include using real-time architectures for patient critical care in healthcare and customer engagement in marketing.
Performance measures are commonly used to track and improve our facilities operations. But some Facility Managers are not comfortable developing and using them. This presentation outlines some easy to understand ways to develop metrics that are useful. It will also examine ways that well-intended performance measures contribute to the wrong outcomes.
This document discusses how lean thinking and digital capabilities can be combined to create innovative solutions for businesses. It advocates using an "ideal state" approach to redesign processes and services, bringing in digital technologies where relevant. The document provides examples of how organizations have used this approach to dramatically improve processes like supplier services, travel and expense claims, and social housing lettings. It emphasizes starting with the ideal customer experience rather than technology, and involving frontline teams in co-creating new solutions.
Making your analytics talk business | Big Data DemystifiedOmid Vahdaty
MAKING YOUR ANALYTICS TALK BUSINESS
Aligning your analysis to the business is fundamental for all types of analytics (digital or product analytics, business intelligence, etc) and is vertical- and tool agnostic. In this talk we will build on the discussion that was started in the previous meetup, and will discuss how analysts can learn to derive their stakeholders' expectations, how to shift from metrics to "real" KPIs, and how to approach an analysis in order to create real impact.
This session is primarily geared towards those starting out into analytics, practitioners who feel that they are still struggling to prove their value in the organization or simply folks who want to power up their reporting and recommendation skills. If you are already a master at aligning your analysis to the business, you're most welcome as well: join us to share your experiences so that we can all learn from each other and improve!
Bios:
Eliza Savov - Eliza is the team lead of the Customer Experience and Analytics team at Clicktale, the worldwide leader in behavioral analytics. She has extensive experience working with data analytics, having previously worked at Clicktale as a senior customer experience analyst, and as a product analyst at Seeking Alpha.
Similar to Turn to datadriven: the first 6 months (20)
The document discusses tools for building the future and their impact. It notes that the speed of iteration matters and that countless hours are lost building administrative interfaces and integrations. It advocates building using a library of standardized reusable components and learning from blockchains about openness and lowering friction. Trends to watch include no-code, augmenting human intelligence with AI, and API-first and systems-level thinking.
Having programmers do data science is terrible, if only everyone else were not even worse. The problem is of course tools. We seem to have settled on either: a bunch of disparate libraries thrown into a more or less agnostic IDE, or some point-and-click wonder which no matter how glossy, never seems to truly fit our domain once we get down to it. The dual lisp tradition of grow-your-own-language and grow-your-own-editor gives me hope there is a third way.
This presentation is a meditation on how I approach data problems with Clojure, what I believe the process of doing data science should look like and the tools needed to get there. Some already exist (or can at least be bodged together); others can be made with relative ease (and we are already working on some of these); but a few will take a lot more hammock time.
Clojure is fantastic for data manipulation and rapid prototyping, but falls short when it comes to communicating your insights. What is lacking are good visualization libraries and (shareable) notebook-like environments. I'll show my workflow in org-babel which weaves Clojure with R (for ggplot) and Python (for scikit-learn) and tell you why it's wrong, how IPythons of the world have trapped us in a local maximum and how we need a reconceptualization similar to what a REPL does to programming. All this interposed with my experience doing data science with Clojure (everything from ETL to on-the-spot analysis during a brainstorming).
The document discusses tools for data analysis and building intelligence including Metabase, an open source business intelligence tool used by over 21,000 companies daily. It focuses on speeding up the time it takes to answer questions from data through automation and building a "data scientist in a box". The goal is to answer 80% of questions from data in under 20 minutes to facilitate real-time exploration and problem solving.
The document provides guidance on leveling up a company's data infrastructure and analytics capabilities. It recommends starting by acquiring and storing data from various sources in a data warehouse. The data should then be transformed into a usable shape before performing analytics. When setting up the infrastructure, the document emphasizes collecting user requirements, designing the data warehouse around key data aspects, and choosing technology that supports iteration, extensibility and prevents data loss. It also provides tips for creating effective dashboards and exploratory analysis. Examples of implementing this approach for two sample companies, MESI and SalesGenomics, are discussed.
Recommendation algorithms and their variations such as ranking are the most common way for machine learning to find its way into a product where it is not the main focus. In this talk we’ll dig into the subtleties of making recommendation algorithms a seamless and integral part of your UX (goal: it should completely fade into the background. The user should not be aware she’s interacting with any kind of machine learning, it should just feel right, perhaps smart or even a tad like cheating); how to solve the cold start problem (and having little training data in general); and how to effectively collect feedback data. I’ll be drawing from my experiences building Metabase, an open source analytics/BI tool, where we extensively use recommendations and ranking to keep users in a state of flow when exploring data; to help with discoverability; and as a way to gently teach analysis and visualization best practices; all on the way towards building an AI data scientist.
This document summarizes Metabase, an open source business intelligence and analytics tool that runs on-premise and is data-agnostic. Metabase is used by over 13,000 companies daily, including Go-Jek which has 4,000 daily active users. Some common use cases for Metabase include exploratory analysis, product development, product analytics, support, customer success, BI dashboarding, and marketing. The document also discusses how Metabase can be used for data-driven product development, such as segmenting users by usage and analyzing feature usage.
In this talk we will look at how to efficiently (in both space and time) summarize large, potentially unbounded, streams of data by approximating the underlying distribution using so-called sketch algorithms. The main approach we are going to be looking at is summarization via histograms. Histograms have a number of desirable properties: they work well in an on-line setting, are embarrassingly parallel, and are space-bound. Not to mention they capture the entire (empirical) distribution which is something that otherwise often gets lost when doing descriptive statistics. Building from that we will delve into related problems of sampling in a stream setting, and updating in a batch setting; and highlight some cool tricks such as capturing time-dynamics via data snapshotting. To finish off we will touch upon algorithms to summarize categorical data, most notably count-min sketch.
Transducers -- composable algorithmic transformation decoupled from input or output sources -- are Clojure’s take on data transformation. In this talk we will look at what makes a transducer; push their composability to the limit chasing the panacea of building complex single-pass transformations out of reusable components (eg. calculating a bunch of descriptive statistics like sum, sum of squares, mean, variance, ... in a single pass without resorting to a spaghetti ball fold); explore how the fact they are decoupled from input and output traversal opens up some interesting possibilities as they can be made to work in both online and batch settings; all drawing from practical examples of using Clojure to analize “awkward-size” data.
Your metrics are wrong according to the document. The document provides reasons why metrics may be wrong and recommendations on how to improve them. Specifically, it recommends thinking in terms of distributions and segmentation rather than aggregates, understanding that populations are dynamic rather than static, determining what is signal versus noise, considering reference points and reproducibility, and documenting metric definitions.
Writing correct smart contract is hard (a recent study estimated that 3% of Ethereum contracts in the wild have some sort of security vulnerability; we all know of the DAO and Parity exploits, …). There are two main reasons for this. First and foremost developing for the blockchain is quite different than what most programmers are used to. The level of concurrency is far beyond our (von Neumann) intuition and mental models. And you can’t stop and inspect running code like you can in other systems. Taken together blockchain is closer to a physical/living system than conventional software — a fact not reflected in the tools available. Compared to other domains our tooling and programming languages are somewhere between rudimentary and bad; and a far cry from where they would need to be to augment developers and help make programming for the blockchain less alien and less error prone. In this talk we will first unpack what makes programming for the blockchain hard, and what are the most common types of vulnerabilities and their causes. Then we will look at the state of art programming language research in correctness proving and programming massively concurrent systems; and how these can be applied to programming smart contracts; revisit some technologies from the past that didn’t get traction at the time, but are nevertheless worth studying; and finishing off by trying to imagine how programming for the blockchain should, and perhaps one day will, look like.
Online statistical analysis using transducers and sketch algorithmsSimon Belak
Online statistical analysis using transducers and sketch algorithms. Don’t know what either is? You are going to learn something very cool (and perspective-changing) then. Know them, but want an experience report? Got you covered, fam.
OpenAI recently published a fun paper where they showed using evolution algorithms to train policy networks to perform on par with state of the art reinforcement deep learning. In this talk we’ll try to reimplement the main ideas in that paper using Neanderthal (blazing fast matrix and linear algebra computations) and Cortex (neural networks); make it massively distributed using Onyx; build a simulation environment using re-frame; and of course save our princess from no particular harm in our toy game example
How to systematically open a new market where every step is supported by data, how to set up learning loops, and where to look for optimization opportunities.
You can do cool and unexpected things if your entire type system is a first class citizen and accessible at runtime.
With the introduction of spec, Clojure got its own distinct spin on a type system. Just as macros add another -time (runtime and compile time) where the full power of the language can be used, spec does to describing data.
The result is an entire additional type system that is a first class citizen and accessible at runtime that facilitates validation, generative testing (a la QuickCheck), destructuring (pattern matching into deeply nested data), data macros (recursive transformations of data) and a pluginable error system. And then you can start building on top of it.
The talk will be half introduction to spec and the ideas packed within it, and half experience report instrumenting 15k loc production codebase (primarily ETL and analytics) with spec.
Clojure has always been good at manipulating data. With the release of spec and Onyx (“a masterless, cloud scale, fault tolerant, high performance distributed computation system”) good became best. In this talk you will learn about a streaming data layer architecture build around Kafka and Onyx that is self-describing, declarative, scalable and convenient to work with for the end user. The focus will be on the power and elegance of describing data and computation with data; the inferences and automations that can be built on top of that; and how and why Clojure is a natural choice for tasks that involve a lot of data manipulation, touching both on functional programming and lisp-specifics such as code-is-data.
We will look at how such an approach can be used to manage a data warehouse by automatically inferring materialized views from raw incoming data or other views based on a combination of heuristics, statistical analysis (seasonality, outlier removal, ...) and predefined ontologies. Doing so is a practical way to maintain a large number of views, increasing their availability and abstracting the complexity into declarative rules, rather than having an ETL pipeline with dozens or even hundreds of hand crafted tasks.
The system described requires relatively little effort upfront but can easily grow with one's needs both in terms of scale as well as scope. With its good introspection capabilities and strong decoupling it is for instance an excellent substrate for putting machine learning algorithms in production, which is the final use-case we will dive into.
So you want to do analytics. If your first thought was: “I’ll just query the production DB”, you are in for a world of pain. The alternative is to spend some time thinking upfront and have a lovely analytics infrastructure everyone loves to use. In this talk I’ll show you what I learned from my mistakes and what is my go-to stack and topology.
The document discusses lean startup methodology and achieving product-market fit. It emphasizes that product-market fit is a process, not a single event. It also stresses the importance of using data and metrics to guide decisions and measure progress towards product-market fit over time. Key metrics mentioned include customer acquisition cost, monthly recurring revenue, churn, and the financial impact of metrics.
This document outlines the typical structure and content for an investor pitch deck, including sections on the founder story, market opportunity, product, business model, team, financials, and request. It provides guidance on the key elements to cover in each section, such as problem/solution fit in the founder story, market size and competitive landscape in the market story, revenue model and financial projections in the business model and financials sections. The overall goal is to tell a cohesive story about the problem being solved, market opportunity, solution, business model and team through compelling narratives and metrics.
Discover the cutting-edge telemetry solution implemented for Alan Wake 2 by Remedy Entertainment in collaboration with AWS. This comprehensive presentation dives into our objectives, detailing how we utilized advanced analytics to drive gameplay improvements and player engagement.
Key highlights include:
Primary Goals: Implementing gameplay and technical telemetry to capture detailed player behavior and game performance data, fostering data-driven decision-making.
Tech Stack: Leveraging AWS services such as EKS for hosting, WAF for security, Karpenter for instance optimization, S3 for data storage, and OpenTelemetry Collector for data collection. EventBridge and Lambda were used for data compression, while Glue ETL and Athena facilitated data transformation and preparation.
Data Utilization: Transforming raw data into actionable insights with technologies like Glue ETL (PySpark scripts), Glue Crawler, and Athena, culminating in detailed visualizations with Tableau.
Achievements: Successfully managing 700 million to 1 billion events per month at a cost-effective rate, with significant savings compared to commercial solutions. This approach has enabled simplified scaling and substantial improvements in game design, reducing player churn through targeted adjustments.
Community Engagement: Enhanced ability to engage with player communities by leveraging precise data insights, despite having a small community management team.
This presentation is an invaluable resource for professionals in game development, data analytics, and cloud computing, offering insights into how telemetry and analytics can revolutionize player experience and game performance optimization.
06-18-2024-Princeton Meetup-Introduction to MilvusTimothy Spann
06-18-2024-Princeton Meetup-Introduction to Milvus
tim.spann@zilliz.com
https://www.linkedin.com/in/timothyspann/
https://x.com/paasdev
https://github.com/tspannhw
https://github.com/milvus-io/milvus
Get Milvused!
https://milvus.io/
Read my Newsletter every week!
https://github.com/tspannhw/FLiPStackWeekly/blob/main/142-17June2024.md
For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here
https://www.youtube.com/@MilvusVectorDatabase/videos
Unstructured Data Meetups -
https://www.meetup.com/unstructured-data-meetup-new-york/
https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7
https://www.meetup.com/pro/unstructureddata/
https://zilliz.com/community/unstructured-data-meetup
https://zilliz.com/event
Twitter/X: https://x.com/milvusio https://x.com/paasdev
LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/
GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw
Invitation to join Discord: https://discord.com/invite/FjCMmaJng6
Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann
Expand LLMs' knowledge by incorporating external data sources into LLMs and your AI applications.
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Marlon Dumas
This webinar discusses the limitations of traditional approaches for business process simulation based on had-crafted model with restrictive assumptions. It shows how process mining techniques can be assembled together to discover high-fidelity digital twins of end-to-end processes from event data.
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)Rebecca Bilbro
To honor ten years of PyData London, join Dr. Rebecca Bilbro as she takes us back in time to reflect on a little over ten years working as a data scientist. One of the many renegade PhDs who joined the fledgling field of data science of the 2010's, Rebecca will share lessons learned the hard way, often from watching data science projects go sideways and learning to fix broken things. Through the lens of these canon events, she'll identify some of the anti-patterns and red flags she's learned to steer around.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of March 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
2. • Low cost on demand transports
• Mainly to and from airports
• Two-sided marketplace
• Secret sauce:
packaging + smart routing + risk management
!
Comfort of a taxi for the price of a bus.
6. Quick wins
• Find something that is loosing money
• Find a conversion optimization
!
Quick wins should be obvious in hind site (and yet
nobody thought of them until you came along)
7. From reports to real-time
thinking support:
• 2 min
• 20 min
• fail
• project
Minimize
analysis
friction
26. Becoming data-driven:
a quick checklist
✓ Invest in data gathering
✓ Find quick wins
✓ Be methodological
✓ Focus on one KPI at a time, but change often
✓ Beware of dead salmons