What is good data visualisation. How do we apply best practises of data visualisation at scale? How do we make sure that all visualisations produced by your analytics team both look good and is effortless.
Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...Sri Ambati
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/U-ENrMUQcJs.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Mark Landry is a competitive data scientist and a product manager at H2O.ai. He is well-trained in getting quick solutions to iterate over and enjoys testing ideas in Kaggle competitions, where his worldwide ranking stands in the top 0.03%. His first encounter with H2O.ai was while when he was hacking R. He reached out Arno Candel (CTO, H2O.ai) to team up in a Kaggle competition as he felt it would be exciting to work with the lead developers of the tool that contributed to his work in R. Mark holds a B.S. in Computer Science from Mississippi State University and was a Principal Engineer at Dell before joining H2O.ai. At Dell, he spearheaded data modeling and project support for the business transformation team, and also developed analytical tools and machine learning models to increase business efficiency.
Mark was the first Kaggle Grandmaster to be employed at H2O.ai and he enabled inroads into the Kaggle community for H2O. At H2O.ai, he has helped modernize the GBM algorithm and provided guidance on multiple projects before being pulled into one of H2O’s biggest projects. He holds interests in multi-model architectures and helps the world make fewer models that perform worse than the mean. He also has a number of publications to his name with multiple citations from the industry and academia alike.
Building Fullstack Graph Applications With Neo4j Neo4j
This document provides an overview of graph databases and algorithms using Neo4j. It discusses Neo4j's built-in graph algorithms for pathfinding, centrality, community detection, similarity and link prediction. It also covers Neo4j Streams for real-time graph processing and integrations with Kafka. Grandstack and Neo4j-GraphQL are presented as options for building GraphQL APIs on Neo4j.
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...Data Science Milan
This document provides an introduction to distributed computing engines for data processing. It discusses what distributed computing systems are and how they address the problem of data and tasks being too large for a single machine. It then covers key distributed computing systems like Hadoop, Spark and Flink. For each system, it summarizes what it is, when and where it originated, why it was created, and how it works at a high level. It also provides brief examples of common use cases for each system today.
Drifting Away: Testing ML Models in ProductionDatabricks
Deploying machine learning models has become a relatively frictionless process. However, properly deploying a model with a robust testing and monitoring framework is a vastly more complex task. There is no one-size-fits-all solution when it comes to productionizing ML models, oftentimes requiring custom implementations utilising multiple libraries and tools. There are however, a set of core statistical tests and metrics one should have in place to detect phenomena such as data and concept drift to prevent models from becoming unknowingly stale and detrimental to the business.
Combining our experiences from working with Databricks customers, we do a deep dive on how to test your ML models in production using open source tools such as MLflow, SciPy and statsmodels. You will come away from this talk armed with knowledge of the key tenets for testing both model and data validity in production, along with a generalizable demo which uses MLflow to assist with the reproducibility of this process.
Using H2O AutoML for Kaggle CompetitionsSri Ambati
The document discusses H2O AutoML, a tool for automated machine learning. It begins by showing some top Kagglers who have used AutoML to achieve good results with less effort. It then provides an overview of what AutoML automates in the model building process like preprocessing, training, tuning, stacking ensembles. AutoML is suitable for novice users who want automation as well as experts who want to save time on routine tasks. The document explains the interface and shows the grid search, stacking and cross-validation process done behind the scenes to build accurate models with less work.
Data analytic for mobile app developmentTrieu Nguyen
This document discusses using data analytics for mobile app development. It recommends analyzing user behavior and interests through metrics like users, sessions, and events to improve the user experience and inform business decisions. The document provides an example of a mobile advertising app that tracked user taps and social sharing to generate analytics and integrate with Facebook data. It advocates keeping analytics implementations simple while designing architectures that can handle large volumes of data.
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/WKAuXlsq6xw.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Leaderboard shake-up and overfitting are commonly known problems in Kaggle competitions. In his talk Dmitry is going to share an interesting approach to model’s performance validation which proven to be useful on Kaggle competitions with noisy data.
Dmitry Larko's Bio:
Senior Data Scientist at H2O.ai, Dmitry also is a former #25 Kaggle Grandmaster and loves to use his machine learning and data science skills in Kaggle Competitions and predictive analytics software development.
He has more than 15 years of experience in information technology. Post his masters in computer information systems from Krasnoyarsk State Technical University (KSTU), he started his career in data warehousing and business intelligence and gradually moved to big data and data science.
He holds a lot of experience in predictive analytics in a wide array of domains and tasks. Prior to H2O.ai, Dmitry held the position of SAP BW Developer at Chevron, Data Scientist at EPAM, and that of Lead Software Engineer with the Russian Federation.
This document summarizes a master's thesis presentation about the Teamsketch app. The app allows 2-4 students to sketch collaboratively on iPads in real-time without internet. It was tested at a primary school where students worked in teams on sketches. The test found collaboration and teamwork skills were improved but also identified areas for improvement. Going forward the presenter aims to refine the app based on feedback and develop it into a commercial collaborative sketching app for designers.
Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...Sri Ambati
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/U-ENrMUQcJs.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Mark Landry is a competitive data scientist and a product manager at H2O.ai. He is well-trained in getting quick solutions to iterate over and enjoys testing ideas in Kaggle competitions, where his worldwide ranking stands in the top 0.03%. His first encounter with H2O.ai was while when he was hacking R. He reached out Arno Candel (CTO, H2O.ai) to team up in a Kaggle competition as he felt it would be exciting to work with the lead developers of the tool that contributed to his work in R. Mark holds a B.S. in Computer Science from Mississippi State University and was a Principal Engineer at Dell before joining H2O.ai. At Dell, he spearheaded data modeling and project support for the business transformation team, and also developed analytical tools and machine learning models to increase business efficiency.
Mark was the first Kaggle Grandmaster to be employed at H2O.ai and he enabled inroads into the Kaggle community for H2O. At H2O.ai, he has helped modernize the GBM algorithm and provided guidance on multiple projects before being pulled into one of H2O’s biggest projects. He holds interests in multi-model architectures and helps the world make fewer models that perform worse than the mean. He also has a number of publications to his name with multiple citations from the industry and academia alike.
Building Fullstack Graph Applications With Neo4j Neo4j
This document provides an overview of graph databases and algorithms using Neo4j. It discusses Neo4j's built-in graph algorithms for pathfinding, centrality, community detection, similarity and link prediction. It also covers Neo4j Streams for real-time graph processing and integrations with Kafka. Grandstack and Neo4j-GraphQL are presented as options for building GraphQL APIs on Neo4j.
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...Data Science Milan
This document provides an introduction to distributed computing engines for data processing. It discusses what distributed computing systems are and how they address the problem of data and tasks being too large for a single machine. It then covers key distributed computing systems like Hadoop, Spark and Flink. For each system, it summarizes what it is, when and where it originated, why it was created, and how it works at a high level. It also provides brief examples of common use cases for each system today.
Drifting Away: Testing ML Models in ProductionDatabricks
Deploying machine learning models has become a relatively frictionless process. However, properly deploying a model with a robust testing and monitoring framework is a vastly more complex task. There is no one-size-fits-all solution when it comes to productionizing ML models, oftentimes requiring custom implementations utilising multiple libraries and tools. There are however, a set of core statistical tests and metrics one should have in place to detect phenomena such as data and concept drift to prevent models from becoming unknowingly stale and detrimental to the business.
Combining our experiences from working with Databricks customers, we do a deep dive on how to test your ML models in production using open source tools such as MLflow, SciPy and statsmodels. You will come away from this talk armed with knowledge of the key tenets for testing both model and data validity in production, along with a generalizable demo which uses MLflow to assist with the reproducibility of this process.
Using H2O AutoML for Kaggle CompetitionsSri Ambati
The document discusses H2O AutoML, a tool for automated machine learning. It begins by showing some top Kagglers who have used AutoML to achieve good results with less effort. It then provides an overview of what AutoML automates in the model building process like preprocessing, training, tuning, stacking ensembles. AutoML is suitable for novice users who want automation as well as experts who want to save time on routine tasks. The document explains the interface and shows the grid search, stacking and cross-validation process done behind the scenes to build accurate models with less work.
Data analytic for mobile app developmentTrieu Nguyen
This document discusses using data analytics for mobile app development. It recommends analyzing user behavior and interests through metrics like users, sessions, and events to improve the user experience and inform business decisions. The document provides an example of a mobile advertising app that tracked user taps and social sharing to generate analytics and integrate with Facebook data. It advocates keeping analytics implementations simple while designing architectures that can handle large volumes of data.
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/WKAuXlsq6xw.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Leaderboard shake-up and overfitting are commonly known problems in Kaggle competitions. In his talk Dmitry is going to share an interesting approach to model’s performance validation which proven to be useful on Kaggle competitions with noisy data.
Dmitry Larko's Bio:
Senior Data Scientist at H2O.ai, Dmitry also is a former #25 Kaggle Grandmaster and loves to use his machine learning and data science skills in Kaggle Competitions and predictive analytics software development.
He has more than 15 years of experience in information technology. Post his masters in computer information systems from Krasnoyarsk State Technical University (KSTU), he started his career in data warehousing and business intelligence and gradually moved to big data and data science.
He holds a lot of experience in predictive analytics in a wide array of domains and tasks. Prior to H2O.ai, Dmitry held the position of SAP BW Developer at Chevron, Data Scientist at EPAM, and that of Lead Software Engineer with the Russian Federation.
This document summarizes a master's thesis presentation about the Teamsketch app. The app allows 2-4 students to sketch collaboratively on iPads in real-time without internet. It was tested at a primary school where students worked in teams on sketches. The test found collaboration and teamwork skills were improved but also identified areas for improvement. Going forward the presenter aims to refine the app based on feedback and develop it into a commercial collaborative sketching app for designers.
Production machine learning_infrastructurejoshwills
This document discusses building machine learning infrastructure to scale data science from the lab to production. It describes two types of data scientists - those focused on investigative analytics in the lab and those building production systems in the factory. Moving analytics from the lab to the factory requires a shift from question-driven and ad-hoc work to metric-driven and automated systems. The document outlines steps to begin this transition such as choosing a good problem, logging everything, and hiring more data scientists. It also describes tools and techniques for experimentation in production machine learning.
Detecting Anomalous Behavior with Surveillance AnalyticsDatabricks
This document discusses using surveillance analytics to automatically detect anomalous behavior and potential crimes without manual monitoring. It outlines challenges with traditional surveillance methods and proposes a solution using object detection models to identify abandoned objects, loitering individuals, and unauthorized vehicles in real-time video feeds. Key aspects of the proposed architecture include preprocessing video data for analytics, scaling the system using Databricks Delta storage, and experiment management with MLflow. A demonstration of the live video analytics pipeline showed it could process frames at 25-30 FPS on a GPU system while updating activity logs and video summaries to a central monitoring system.
Predicting Medical Test Results using Driverless AISri Ambati
1. poder.IO uses AI to predict customer behavior and personalize experiences. It deploys over 100 models daily using techniques like regression, classification, text analysis and deep learning.
2. Driverless AI is currently used to benchmark models before production and for research cases. It may be used starting Q3 2018 for advertising optimization, content classification, profile matching and look-alike modeling.
3. A joint team from poder.IO and Bayer developed models to predict individual medical test results using healthcare data, without direct lab measures. This could help improve treatment strategies. They used techniques like GLM, GBM, random forest and Driverless AI to develop and compare models for a medical test, finding Driver
How to keep yourself up to date with changes in the technology world.
More details: http://trishagee.github.io/presentation/staying_ahead_of_the_curve/
This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/oxLZZMR1lVY
Description:
Driverless AI is H2O.ai's latest flagship product for automatic machine learning. It fully automates some of the most challenging and productive tasks in applied data science such as feature engineering, model tuning, model ensembling and model deployment. Driverless AI turns Kaggle-winning grandmaster recipes into production-ready code, and is specifically designed to avoid common mistakes such as under- or overfitting, data leakage or improper model validation, some of the hardest challenges in data science. Avoiding these pitfalls alone can save weeks or more for each model, and is necessary to achieve high modeling accuracy, especially for time-series problems.
With Driverless AI, data scientists of all proficiency levels can train and deploy modeling pipelines with just a few clicks from the GUI. Advanced users can use the client API from Python. Driverless AI builds hundreds or thousands of models under the hood to select the best feature engineering and modeling pipeline for every specific problem such as churn prediction, fraud detection, real-estate pricing, store sales prediction, marketing ad campaigns and many more.
To speed up training, Driverless AI uses highly optimized C++/CUDA algorithms to take full advantage of the latest compute hardware. For example, Driverless AI runs orders of magnitudes faster on the latest Nvidia GPU supercomputers on Intel and IBM platforms, both in the cloud or on premise. Driverless AI is fully supported on all major cloud providers.
There are two more product innovations in Driverless AI: statistically rigorous automatic data visualization and machine learning interpretability with reason codes and explanations in plain English. Both help data scientists and analysts to quickly validate the data and the models.
In this talk, we explain how Driverless AI works and show how easy it is to reach top 5% rankings for several highly competitive Kaggle competitions. (edited)
Speaker's Bio:
Arno Candel is the Chief Technology Officer at H2O.ai. He is the main committer of H2O-3 and Driverless AI and has been designing and implementing high-performance machine-learning algorithms since 2012. Previously, he spent a decade in supercomputing at ETH and SLAC and collaborated with CERN on next-generation particle accelerators. Arno holds a PhD and Masters summa cum laude in Physics from ETH Zurich, Switzerland. He was named “2014 Big Data All-Star” by Fortune Magazine and featured by ETH GLOBE in 2015. Follow him on Twitter: @ArnoCandel.
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
Dr. Pouria Amirian explains data science, steps in a data science workflow and show some experiments in AzureML. He also mentions about big data issues in a data science project and solutions to them.
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Sri Ambati
This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/xc3j20Om3UM
Description:
Data science is indeed one of the sexy jobs of the 21st century. But it is also a lot of hard work. And the hard work is seldom about the math or the algorithms. It is about building relevant machine learning products for the real world. We will go over some of the must-haves as you take your machine learning model out of the sandbox and make it work in the big, bad world outside.
Speaker's Bio:
Krish Swamy is an experienced professional with deep skills in applying analytics and BigData capabilities to challenging business problems and driving customer insights. Krish's analytic experience includes marketing and pricing, credit risk, digital analytics and most recently, big data analytics and data transformation. His key experiences lie in banking and financial services, the digital customer experience domain, with a background in management consulting. Other key skills include influencing organizational change towards a data and analytics driven culture, and building teams of analysts, statisticians and data scientists.
Importance of ML Reproducibility & Applications with MLfLowDatabricks
With data as a valuable currency and the architecture of reliable, scalable Data Lakes and Lakehouses continuing to mature, it is crucial that machine learning training and deployment techniques keep up to realize value. Reproducibility, efficiency, and governance in training and production environments rest on the shoulders of both point in time snapshots of the data and a governing mechanism to regulate, track, and make best use of associated metadata.
This talk will outline the challenges and importance of building and maintaining reproducible, efficient, and governed machine learning solutions as well as posing solutions built on open source technologies – namely Delta Lake for data versioning and MLflow for efficiency and governance.
Promoting a Data Driven Culture in a Microservices EnvironmentPyData
This document discusses Hudl's journey to democratize data access and promote a data-driven culture. It outlines Hudl's data engineering approach, including using AWS Redshift as the data warehouse, Luigi for workflow management, and Sqoop and Spark for data extraction and transformation. It also details Hudl's data analytics efforts like educating employees on SQL, building derived tables and automated reports, and implementing a self-service query and visualization tool called re:dash to help employees access and use data to make better decisions. The goal is to remove roadblocks to working with data and make data and metrics accessible to all at Hudl.
This slides was presented by me at PHPIndonesia and FemaleGeek Meetup on 18th June, 2016.
On this occassion, I've shared about how Kudo start and organize our data team and more technically on how Kudo use and implement ETL and machine learning.
Overview of classical Statistical Tests and how to apply them in python to a real world problems in online setting. A/B testing. Confidence intervals. Bayesian and. Frequentist methods. Presented at PyData Dallas 2015.
The document discusses Bayesian model averaging (BMA) as a Bayesian approach to combining multiple models, explaining how to implement BMA using R packages, highlighting that BMA works well for linear models but its application to more complex models is still limited, and concludes by noting that BMA provides useful tools for model interpretation and combination beyond just prediction.
How to organise a Jupyter IPython notebook research project, so that yourself, as well as others, be able to read, understand and reproduce your work? How big should a notebook be? What to put in one cell? How do Clean Code principles outlined by Robert C. Martin aka Uncle Bob relate to Python and more specifically to IPython?
Is Agile Data Science just two buzzwords put together? I argue that agile is a very practical and applicable methodology, that does work well in the real world for all sorts of Analytics and Data Science workflows.
http://theinnovationenterprise.com/summits/digital-web-analytics-summit-london-2015/schedule
Ld perda no. 8-rencana umum tata ruang kota sumurMrj Iwan
Peraturan Daerah ini mengatur tentang Rencana Umum Tata Ruang Kota Sumur di Kabupaten Pandeglang. Rencana ini bertujuan untuk memberikan arah pengembangan kawasan budidaya dan non budidaya di Kota Sumur serta menjadi dasar bagi pemerintah Kabupaten Pandeglang dalam menetapkan lokasi dan rekomendasi pemanfaatan lahan.
The document summarizes the daily activities and lessons from a writing class. It describes:
- A morning free write activity quoting Paul Laurence Dunbar.
- Presentations from students on various informative topics like newspapers, governments, butterflies and more.
- A lecture on opinion and argument writing focusing on using evidence to support claims.
- Book club meetings and a portfolio reflection activity where students reviewed their narrative, informative pieces and free writes.
台東娜路彎大酒店簡介 (Formosan Naruwan Hotel & Resort Taitung)Rebecca Chen
『Naruwan』在原住民語言中的意思代表著『您好』與『歡迎』之意,就像夏威夷的『啊囉哈』般,讓人們感受這份熱情、自在與歡樂!五星級觀光渡假飯店的台東娜路彎大酒店,用這句招呼語來命名,即希望所有蒞臨的賓客,都能深刻分享娜路彎所有成員誠摯歡迎之意。
Naruwan is a greeting word which means “ How are you?” and “Welcome!” in Taiwan aboriginal language, just like “Aloha” in Hawaii. Name the hotel ‘Formosan Naruwan Hotel Taitung’, we greeting all guests with the highest passion.
台東是日昇之鄉,充沛又耀眼的陽光是我們最大的資產!故本館於設計興建之初,即決定採「斜背式建築」,佐以大面積的透明玻璃將最大量的陽光及周遭的山光海色引入室內,其特殊的三角型外觀,亦成了台東的新地標!
Taitung as the Native town of the sun makes sunshine as our greatest assets. At the initiative concept of designation, we took a ladder-shaped design and use great measure of glass to take in the brilliant sunshine and scenery; hence the similar pyramid building becomes a new landmark in Taitung.
台東娜路彎大酒店本身即為一藝術作品,薈萃了現代時尚與原住民文化。酒店正門的「四大石柱」各高九公尺,使用來自中國大陸山西省砂岩為材料,分別以浮雕手法雕出了魯凱族、達悟族、卑南族及阿美族的先民生活風貌及人物故事,在這奇特絕妙的四度空間中,石柱無聲地述說出流傳久遠的歷史。
Containing modern fashion and aboriginal culture, Formosan Naruwan Hotel Taitung is a genuine artwork. The four pillars in front of the facade are 9 meter height and the material is sandstone from ShanshiProvince of mainland China. On the pillars, it tells the history of Taiwan aboriginal tribes, Lu-Kai, Dao, Puyuma and Amis.
Production machine learning_infrastructurejoshwills
This document discusses building machine learning infrastructure to scale data science from the lab to production. It describes two types of data scientists - those focused on investigative analytics in the lab and those building production systems in the factory. Moving analytics from the lab to the factory requires a shift from question-driven and ad-hoc work to metric-driven and automated systems. The document outlines steps to begin this transition such as choosing a good problem, logging everything, and hiring more data scientists. It also describes tools and techniques for experimentation in production machine learning.
Detecting Anomalous Behavior with Surveillance AnalyticsDatabricks
This document discusses using surveillance analytics to automatically detect anomalous behavior and potential crimes without manual monitoring. It outlines challenges with traditional surveillance methods and proposes a solution using object detection models to identify abandoned objects, loitering individuals, and unauthorized vehicles in real-time video feeds. Key aspects of the proposed architecture include preprocessing video data for analytics, scaling the system using Databricks Delta storage, and experiment management with MLflow. A demonstration of the live video analytics pipeline showed it could process frames at 25-30 FPS on a GPU system while updating activity logs and video summaries to a central monitoring system.
Predicting Medical Test Results using Driverless AISri Ambati
1. poder.IO uses AI to predict customer behavior and personalize experiences. It deploys over 100 models daily using techniques like regression, classification, text analysis and deep learning.
2. Driverless AI is currently used to benchmark models before production and for research cases. It may be used starting Q3 2018 for advertising optimization, content classification, profile matching and look-alike modeling.
3. A joint team from poder.IO and Bayer developed models to predict individual medical test results using healthcare data, without direct lab measures. This could help improve treatment strategies. They used techniques like GLM, GBM, random forest and Driverless AI to develop and compare models for a medical test, finding Driver
How to keep yourself up to date with changes in the technology world.
More details: http://trishagee.github.io/presentation/staying_ahead_of_the_curve/
This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/oxLZZMR1lVY
Description:
Driverless AI is H2O.ai's latest flagship product for automatic machine learning. It fully automates some of the most challenging and productive tasks in applied data science such as feature engineering, model tuning, model ensembling and model deployment. Driverless AI turns Kaggle-winning grandmaster recipes into production-ready code, and is specifically designed to avoid common mistakes such as under- or overfitting, data leakage or improper model validation, some of the hardest challenges in data science. Avoiding these pitfalls alone can save weeks or more for each model, and is necessary to achieve high modeling accuracy, especially for time-series problems.
With Driverless AI, data scientists of all proficiency levels can train and deploy modeling pipelines with just a few clicks from the GUI. Advanced users can use the client API from Python. Driverless AI builds hundreds or thousands of models under the hood to select the best feature engineering and modeling pipeline for every specific problem such as churn prediction, fraud detection, real-estate pricing, store sales prediction, marketing ad campaigns and many more.
To speed up training, Driverless AI uses highly optimized C++/CUDA algorithms to take full advantage of the latest compute hardware. For example, Driverless AI runs orders of magnitudes faster on the latest Nvidia GPU supercomputers on Intel and IBM platforms, both in the cloud or on premise. Driverless AI is fully supported on all major cloud providers.
There are two more product innovations in Driverless AI: statistically rigorous automatic data visualization and machine learning interpretability with reason codes and explanations in plain English. Both help data scientists and analysts to quickly validate the data and the models.
In this talk, we explain how Driverless AI works and show how easy it is to reach top 5% rankings for several highly competitive Kaggle competitions. (edited)
Speaker's Bio:
Arno Candel is the Chief Technology Officer at H2O.ai. He is the main committer of H2O-3 and Driverless AI and has been designing and implementing high-performance machine-learning algorithms since 2012. Previously, he spent a decade in supercomputing at ETH and SLAC and collaborated with CERN on next-generation particle accelerators. Arno holds a PhD and Masters summa cum laude in Physics from ETH Zurich, Switzerland. He was named “2014 Big Data All-Star” by Fortune Magazine and featured by ETH GLOBE in 2015. Follow him on Twitter: @ArnoCandel.
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
Dr. Pouria Amirian explains data science, steps in a data science workflow and show some experiments in AzureML. He also mentions about big data issues in a data science project and solutions to them.
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Sri Ambati
This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/xc3j20Om3UM
Description:
Data science is indeed one of the sexy jobs of the 21st century. But it is also a lot of hard work. And the hard work is seldom about the math or the algorithms. It is about building relevant machine learning products for the real world. We will go over some of the must-haves as you take your machine learning model out of the sandbox and make it work in the big, bad world outside.
Speaker's Bio:
Krish Swamy is an experienced professional with deep skills in applying analytics and BigData capabilities to challenging business problems and driving customer insights. Krish's analytic experience includes marketing and pricing, credit risk, digital analytics and most recently, big data analytics and data transformation. His key experiences lie in banking and financial services, the digital customer experience domain, with a background in management consulting. Other key skills include influencing organizational change towards a data and analytics driven culture, and building teams of analysts, statisticians and data scientists.
Importance of ML Reproducibility & Applications with MLfLowDatabricks
With data as a valuable currency and the architecture of reliable, scalable Data Lakes and Lakehouses continuing to mature, it is crucial that machine learning training and deployment techniques keep up to realize value. Reproducibility, efficiency, and governance in training and production environments rest on the shoulders of both point in time snapshots of the data and a governing mechanism to regulate, track, and make best use of associated metadata.
This talk will outline the challenges and importance of building and maintaining reproducible, efficient, and governed machine learning solutions as well as posing solutions built on open source technologies – namely Delta Lake for data versioning and MLflow for efficiency and governance.
Promoting a Data Driven Culture in a Microservices EnvironmentPyData
This document discusses Hudl's journey to democratize data access and promote a data-driven culture. It outlines Hudl's data engineering approach, including using AWS Redshift as the data warehouse, Luigi for workflow management, and Sqoop and Spark for data extraction and transformation. It also details Hudl's data analytics efforts like educating employees on SQL, building derived tables and automated reports, and implementing a self-service query and visualization tool called re:dash to help employees access and use data to make better decisions. The goal is to remove roadblocks to working with data and make data and metrics accessible to all at Hudl.
This slides was presented by me at PHPIndonesia and FemaleGeek Meetup on 18th June, 2016.
On this occassion, I've shared about how Kudo start and organize our data team and more technically on how Kudo use and implement ETL and machine learning.
Overview of classical Statistical Tests and how to apply them in python to a real world problems in online setting. A/B testing. Confidence intervals. Bayesian and. Frequentist methods. Presented at PyData Dallas 2015.
The document discusses Bayesian model averaging (BMA) as a Bayesian approach to combining multiple models, explaining how to implement BMA using R packages, highlighting that BMA works well for linear models but its application to more complex models is still limited, and concludes by noting that BMA provides useful tools for model interpretation and combination beyond just prediction.
How to organise a Jupyter IPython notebook research project, so that yourself, as well as others, be able to read, understand and reproduce your work? How big should a notebook be? What to put in one cell? How do Clean Code principles outlined by Robert C. Martin aka Uncle Bob relate to Python and more specifically to IPython?
Is Agile Data Science just two buzzwords put together? I argue that agile is a very practical and applicable methodology, that does work well in the real world for all sorts of Analytics and Data Science workflows.
http://theinnovationenterprise.com/summits/digital-web-analytics-summit-london-2015/schedule
Ld perda no. 8-rencana umum tata ruang kota sumurMrj Iwan
Peraturan Daerah ini mengatur tentang Rencana Umum Tata Ruang Kota Sumur di Kabupaten Pandeglang. Rencana ini bertujuan untuk memberikan arah pengembangan kawasan budidaya dan non budidaya di Kota Sumur serta menjadi dasar bagi pemerintah Kabupaten Pandeglang dalam menetapkan lokasi dan rekomendasi pemanfaatan lahan.
The document summarizes the daily activities and lessons from a writing class. It describes:
- A morning free write activity quoting Paul Laurence Dunbar.
- Presentations from students on various informative topics like newspapers, governments, butterflies and more.
- A lecture on opinion and argument writing focusing on using evidence to support claims.
- Book club meetings and a portfolio reflection activity where students reviewed their narrative, informative pieces and free writes.
台東娜路彎大酒店簡介 (Formosan Naruwan Hotel & Resort Taitung)Rebecca Chen
『Naruwan』在原住民語言中的意思代表著『您好』與『歡迎』之意,就像夏威夷的『啊囉哈』般,讓人們感受這份熱情、自在與歡樂!五星級觀光渡假飯店的台東娜路彎大酒店,用這句招呼語來命名,即希望所有蒞臨的賓客,都能深刻分享娜路彎所有成員誠摯歡迎之意。
Naruwan is a greeting word which means “ How are you?” and “Welcome!” in Taiwan aboriginal language, just like “Aloha” in Hawaii. Name the hotel ‘Formosan Naruwan Hotel Taitung’, we greeting all guests with the highest passion.
台東是日昇之鄉,充沛又耀眼的陽光是我們最大的資產!故本館於設計興建之初,即決定採「斜背式建築」,佐以大面積的透明玻璃將最大量的陽光及周遭的山光海色引入室內,其特殊的三角型外觀,亦成了台東的新地標!
Taitung as the Native town of the sun makes sunshine as our greatest assets. At the initiative concept of designation, we took a ladder-shaped design and use great measure of glass to take in the brilliant sunshine and scenery; hence the similar pyramid building becomes a new landmark in Taitung.
台東娜路彎大酒店本身即為一藝術作品,薈萃了現代時尚與原住民文化。酒店正門的「四大石柱」各高九公尺,使用來自中國大陸山西省砂岩為材料,分別以浮雕手法雕出了魯凱族、達悟族、卑南族及阿美族的先民生活風貌及人物故事,在這奇特絕妙的四度空間中,石柱無聲地述說出流傳久遠的歷史。
Containing modern fashion and aboriginal culture, Formosan Naruwan Hotel Taitung is a genuine artwork. The four pillars in front of the facade are 9 meter height and the material is sandstone from ShanshiProvince of mainland China. On the pillars, it tells the history of Taiwan aboriginal tribes, Lu-Kai, Dao, Puyuma and Amis.
The document is a program outline for a week-long outdoor education trip organized by the Toronto Catholic District School Board's Outdoor Education Department. It will involve spiritual growth activities, orienteering, wilderness survival skills training, a ropes course, reenactments of fur trade and settler games, and a wolf prowl simulation. Students are reminded to bring cameras but leave behind medications, electronics, junk food, and are able to call home. The trip aims to be an engaging week of outdoor learning experiences.
1. Laporan tahunan pelayanan informasi pengadilan tahun 2010 memberikan gambaran umum tentang upaya pengadilan dalam meningkatkan pelayanan informasi kepada masyarakat melalui peningkatan sarana dan prasarana serta sumber daya manusia.
The document analyzes student survey data from the National Assessment of Educational Progress to assess how challenged and engaged students feel in school. Some key findings include:
- Many students report that their schoolwork is too easy, with 37% of 4th graders saying math work is too easy.
- Students are not engaged in rigorous activities, with over 30% of 8th graders writing long reading answers twice a year or less.
- Most students say they are not taught engineering and technology in science class, with 72% of 8th grade science students reporting this.
- Students from disadvantaged backgrounds are less likely to report understanding teachers or having access to rigorous opportunities.
Update on launch of the Digital & Creative Career College, one of the first three Career Colleges to launch in the UK in September 2014.
Based within the campus of Oldham College in Greater Manchester, the DCCC works with industry to help students develop skills for success in digitally driven industries.
PackeTV is Visionary Solutions' modular end-to-end IP video management solution that enables secure scheduled and on-demand delivery of live and recorded video to any screen. It allows organizations to deliver high-quality HD/SD video over various networks in a simple and cost-effective manner. PackeTV features a centralized management system, modular flexibility to grow over time, and secure delivery of content through encryption.
Teaching and research with MIKE by DHI - Dr Björn Elsäßer (Queen’s University...Stephen Flood
This document summarizes the use of MIKE software at Queen's University Belfast for teaching and research purposes. It discusses how MIKE is used in various courses to model coastal engineering processes and tidal energy. It also describes several research projects using MIKE to model wave energy converter arrays, sewage outfall impacts, horse mussel larval transport, and more. The document emphasizes that MIKE provides an easy interface for students to learn modeling while also serving as a valuable research tool.
In order to simplify and consolidate HLS installations, Visionary Solutions created the PackeTV® Mobile HLS, a single device that performs content preparation and delivery. This integrated file server can support hundreds of users, eliminating the need for content delivery network (CDN) services.
Equipped with two gigabit Ethernet ports that provide a substantial amount of network bandwidth, the unit can store hundreds of hours of pre-recorded content. The entry-level system is housed in a single, compact 1RU chassis that fits perfectly into any standard 20-inch deep AV rack.
The PackeTV® Mobile HLS server supports all of the functions needed to accept H.264 video streams (real-time or file-based) and deliver HLS streams, including content
preparation, file storage, and content delivery. All of this functionality is contained within a server that has been specifically designed to optimize throughput and ease of use. PackeTV® Mobile HLS dramatically lowers operational costs compared with systems that use traditional streaming CDNs to simultaneously distribute video to multiple clients. System ownership ensures seamless, around-the-clock availability of the video streams. Each video stream can be published once and made available to all viewers with a simple set of user commands. Occasional users will find that this single, integrated system eliminates much of the configuration complexity that normally occurs when multiple subsystems from different manufacturers and service providers need to be integrated to form a complete solution. Heavy users will appreciate the flexibility that is available within the device configuration menus, which allow system operations to be customized to accommodate a wide range of bit rates, signal formats, and target devices. Also, because standard HTTP
Web-server technologies are used for content delivery, the added fees required for high throughput streaming service providers such as CDNs are eliminated, saving the content
provider money. With an on-site PackeTV® Mobile HLS, content asset management can be greatly simplified and centralized on a single server. A single video file can be created and delivered to an organization’s internal and external viewers, eliminating the need to manage multiple streams on different servers. Network bandwidth is also used more efficiently, as chunks can be downloaded quickly, and each client device only consumes as much data as the quality of their network connection allows More than half of all overall video consumption is consumed on a Wi-Fi device, according to Streaming Media, so content providers must make mobile content delivery an immediate priority.
Este documento presenta un mapa conceptual sobre las características de la gerencia de proyectos y su ciclo de vida, incluyendo las fases, responsables y tiempos de desarrollo. Resalta la importancia de planificar y administrar adecuadamente los recursos para lograr el éxito del proyecto. Concluye que el éxito depende de la planificación y administración de recursos, es responsabilidad del director definir objetivos, acciones y tiempos, y que cuando se desarrolla adecuadamente el proyecto cumple las fases y
This document summarizes new tools that supplement Pentaho Open BI CE and improve usability of business intelligence solutions. It describes STPivot, an OLAP viewer with enhanced features like Ajax interface and new charts. STDashboard is an easy to use dashboard creator for end users to analyze and share dashboards. STCard allows creating and managing scorecards with indicators, perspectives and KPIs to monitor objectives. The tools are mobile compliant and their components can be embedded in applications.
The document discusses alternatives to using SAP servers for reporting, including using Crystal Reports or free software. It analyzes using existing Crystal Reports templates but notes limitations without the SAP server. Windward Studios is presented as a solution that allows template application without errors and supports exporting to databases. The conclusion is that Windward Studios provides advantages over Crystal Reports like avoiding SAP servers and support for cross-platform use in Microsoft products.
IT10856 - AutoCAD Tool Palettes Master Class (Presentation)Paul Munford
This document summarizes a presentation about using tool palettes in AutoCAD to maintain CAD standards. It discusses creating tool palettes on a network drive with standardized tools, layers and styles. It also covers creating administrator and user profiles to deploy the tool palettes and ensure compliance with the CAD standard. The presentation provides steps for setting up the folders, profiles and tool palettes as well as deploying and updating them.
solidworks vs mytools utilities featuresshezperera97
The document describes myCADtools, a software add-on for SOLIDWORKS. It provides over 30 utilities to enhance SOLIDWORKS functionality. Some key utilities described are SelectMaterial for applying materials to parts, CurveEquation for creating curves from mathematical equations, DriveAssembly for animating assemblies based on dimensions, and AssemblyBoard for automatically generating exploded views and bills of materials from assemblies. The utilities aim to improve efficiency, standardization, and communication compared to using SOLIDWORKS alone.
NGO Analytics solution based on open source including KPIs, reports, OLAP Analysis, Dashboards, Scorecards, Big Data and Machine Learning with 'predefined templates, dashboards and KPIs/ratios' and fully customizable environment
In celebration of Maker Week, the Virginia Tech Northern Virginia Center hosted a 3DPrinting Day. This presentation is on how to use OpenSCAD (http://openscad.org) for 3D modeling.
Mycadtools is a software suite consisting of more than 50 utilities fully integrated with Solidworks. It adapts the CAD tool to your organization and work methods, reduces time wasted due to CAD data management in favour of design and innovation and enriches the basic CAD capabilities by taking into account your specific business requirements
Mathcad 15.0 is a software tool that allows engineers to perform, document, manage, and share calculations. It combines equations, text, and graphs into a single worksheet. This makes it easy to capture and document engineering work. Mathcad also enables collaboration and integration with other applications. It automatically documents calculations to simplify compliance and troubleshooting.
Education Analytics solution based on open source including KPIs, reports, OLAP Analysis, Dashboards, Scorecards, Big Data and Machine Learning with 'predefined templates, dashboards and KPIs/ratios' and fully customizable environment
Knowledge sharing session of the Salesforce Lausanne, Switzerland User Group about useful tools for Swiss companies and useful apps and chrome extensions for all Salesforce Administrators.
Speakers: Giuseppe Cardace, Ralf Roijakkers, John Sas, Madog Williams
Tourism Analytics solution based on open source including KPIs, reports, OLAP Analysis, Dashboards, Scorecards, Big Data and Machine Learning with 'predefined templates, dashboards and KPIs/ratios' and fully customizable environment
Seattle DAA - Data Visualization - Russell Spangler December 2019 Russell Spangler
I presented at the Seattle DAA conference on Microsoft's conference. Presentation goes over principals and tips of data visualization and talks about inspiration on how to build awesome visuals!
Empowering the AWS DynamoDB™ application developer with AlternatorScyllaDB
Getting started with AWS DynamoDB™ is famously easy, but as an application grows and evolves it often starts to struggle with DynamoDB’s limitations. We introduce Scylla’s Alternator, which provides the same API as DynamoDB but aims to empower the application developer. In this presentation we will survey some of Alternator’s developer-centered features: Alternator lets you test and eventually deploy your application anywhere, on any public cloud or private cluster. It efficiently supports multiple tables so it does not require difficult single-table design. Finally, Alternator provides the developer with strong observability tools. The insights provided by these tools can detect bottlenecks, improve performance and even lower its cost.
Utilities Analytics solution based on open source including KPIs, reports, OLAP Analysis, Dashboards, Scorecards, Big Data and Machine Learning with 'predefined templates, dashboards and KPIs/ratios' and fully customizable environment
Vikas Mahajan gave a presentation on AutoCAD software. AutoCAD is a CAD software application used for 2D and 3D design. It was one of the first CAD programs to run on personal computers when initially released in 1982. The presentation demonstrated AutoCAD's capabilities like reducing complex tasks through associative features, decreasing errors with tools like IGES translation, and improving productivity with automatic dimensioning and intelligent symbols.
Finance Analytics solution based on open source including KPIs, reports, OLAP Analysis, Dashboards, Scorecards, Big Data and Machine Learning with 'predefined templates, dashboards and KPIs/ratios' and fully customizable environment
MW2011 Grid-based Web Design presentationCharlie Moad
This document discusses the benefits of using grid-based web design. It provides a brief history of grid design and influential designers like Emil Ruder and Josef Müller-Brockmann. Grids offer benefits to designers, developers, and content authors by providing structure and consistency. A case study of redesigning the Indianapolis Museum of Art's website using a grid is presented. Tools for implementing grids are also reviewed. The document argues that grids will remain a relevant design approach as new devices emerge.
Human Resources Analytics solution based on open source including KPIs, reports, OLAP Analysis, Dashboards, Scorecards, Big Data and Machine Learning with 'predefined templates, dashboards and KPIs/ratios' and fully customizable environment
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
2. volodymyrk
About myself
MS Math,
Probability Theory
Kiev, 1999-2004
Graphics
Programming,
Video Games
Kiev, 2002-2005
Visual Effect
Programming
Berlin, Sydney, London
2005-2010
MBA
London Business
School
2010-2012
Product Manager
(King, Splash Damage)
2012-2013
Head of Data Science
2013-present
9. volodymyrk
My rules for Effective Data Visualisation
1. Keep it simple
2. Keep a high data-ink ratio
3. Consistency is important
4. Mind the Context
11. This does not look great
by default.
(but defaults are much
improved, especially
with seaborn)
12. publish()
1. formats the chart
2. create chart label (large font)
3. saves “Random Data.png”
into “Images” folder with high
DPI
13. volodymyrk
Python Visualisations for reports
compared to Matplotlib:
1. no borders
2. double width lines
3. markers
4. Cynthia Brewer colors
5. borderless legend
6. light-grey grid lines
7. slightly darker grey on
x-axis
8. ticks outside, x-axis
only
14. volodymyrk
Python Visualisations for reports
● White background for presentations
● Avoid vector formats (.svg, .swf). Use high DPI .png
● Consistent style, colors and fonts make reports look professional
18. volodymyrk
Dashboards, V2 - The Style Guide
❑ Charts should be 800px wide, the dashboard no wider than 1000px. Charts height: 200-300px
❑ Charts BG RGB: 238 243 250
❑ Dates should be formatted “d mmm” e.g. “7 Jan”. Only include the year if absolutely necessary
❑ Don’t show unnecessary precision: 0.50% is the same as 0.5%
❑ Bar charts always start their axis at 0
❑ A line graphs’ axis should start wherever makes the average slope 45º
❑ Add titles for Chart (centered, bold), axis too (if not obvious)
❑ Add “Updated at … UTC” in the bottom of the first chart in Dashboard
❑ Still looking for a perfect Date selector.. Use Default Tableau one, not minimalistic one.
❑ Filters should apply to all charts in a dashboard
❑ No scrolling anywhere on the dashboard. Browser has a scrolling bar already. Huge legends/filters are useless.
19. volodymyrk
❑ Charts should be 800px wide, the dashboard no wider than 1000px. Charts height: 200-300px
❑ Charts BG RGB: 238 243 250
❑ Dates should be formatted “d mmm” e.g. “7 Jan”. Only include the year if absolutely necessary
❑ Don’t show unnecessary precision: 0.50% is the same as 0.5%
❑ Bar charts always start their axis at 0
❑ A line graphs’ axis should start wherever makes the average slope 45º
❑ Add titles for Chart (centered, bold), axis too (if not obvious)
❑ Add “Updated at … UTC” in the bottom of the first chart in Dashboard
❑ Still looking for a perfect Date selector.. Use Default Tableau one, not minimalistic one.
❑ Filters should apply to all charts in a dashboard
❑ No scrolling anywhere on the dashboard. Browser has a scrolling bar already. Huge legends/filters are useless.
Dashboards, V2 - The Style Guide
No Version Control
Maintenance takes time
..and still no good Date Selector
26. volodymyrk
Summary
● Good looking visualisation is better than an ugly one
● Interactivity leads to more insights
● Consistency matters; Code allows to style once
● You never really “develop from scratch”, or “just use
off-the-shelf” tool
● Mind your team capabilities and aspirations
● Don’t be limited by your existing tool(s)