The document discusses standardizing over 113 million merchant names from transaction data using regex and fuzzy matching. It involved extracting features from merchant names, cleaning names using regular expressions, fuzzy matching to group similar names, and manual rules. This allowed preliminary analysis showing 90% of transactions and spending were concentrated in 7-8% of top merchants. Customer segments were identified based on relative value added scores.
Measuring and Improving CX as a PM by fmr Twilio Staff PMProduct School
As a Product Manager, you benefit from mixing anecdotes and data to have an understanding of customer needs.
- You can track the success metrics of your product launch better by defining output metrics vs intermediary signals of progress.
- Sometimes you need to drive stakeholder alignment and internal process changes to improve customer experience
- You can build your own checklist for product launch and for communicating with customers to ensure you have your metrics ready, your communication is well received, and you are driving the desired customer behavior.
An short introduction on Big Query. With this presentation you'll quickly discover :
How load data in BigQuery
How to build dashboard using BigQuery
How to work with BigQuery
and, at last but not least, we've added some best practices
We hope you'll enjoy this presentation and that it will help you to start exploring this wonderful solution. Don't hesitate to send us your feedbacks or questions
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)Aurimas Mikalauskas
Is my MySQL server configured properly? Should I run Community MySQL, MariaDB, Percona or WebScaleSQL? How many innodb buffer pool instances should I run? Why should I NOT use the query cache? How do I size the innodb log file size and what IS that innodb log anyway? All answers are inside.
Aurimas Mikalauskas is a former Percona performance consultant and architect currently writing and teaching at speedemy.com. He's been involved with MySQL since 1999, scaling and optimizing MySQL backed systems since 2004 for companies such as BBC, EngineYard, famous social networks and small shops like EstanteVirtual, Pine Cove and hundreds of others.
Additional content mentioned in the presentation can be found here: http://speedemy.com/17
The catalyst for the success of automobiles came not through the invention of the car but rather through the establishment of an innovative assembly line. History shows us that the ability to mass produce and distribute a product is the key to driving adoption of any innovation, and machine learning is no different. MLOps is the assembly line of Machine Learning and in this presentation we will discuss the core capabilities your organization should be focused on to implement a successful MLOps system.
Measuring and Improving CX as a PM by fmr Twilio Staff PMProduct School
As a Product Manager, you benefit from mixing anecdotes and data to have an understanding of customer needs.
- You can track the success metrics of your product launch better by defining output metrics vs intermediary signals of progress.
- Sometimes you need to drive stakeholder alignment and internal process changes to improve customer experience
- You can build your own checklist for product launch and for communicating with customers to ensure you have your metrics ready, your communication is well received, and you are driving the desired customer behavior.
An short introduction on Big Query. With this presentation you'll quickly discover :
How load data in BigQuery
How to build dashboard using BigQuery
How to work with BigQuery
and, at last but not least, we've added some best practices
We hope you'll enjoy this presentation and that it will help you to start exploring this wonderful solution. Don't hesitate to send us your feedbacks or questions
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)Aurimas Mikalauskas
Is my MySQL server configured properly? Should I run Community MySQL, MariaDB, Percona or WebScaleSQL? How many innodb buffer pool instances should I run? Why should I NOT use the query cache? How do I size the innodb log file size and what IS that innodb log anyway? All answers are inside.
Aurimas Mikalauskas is a former Percona performance consultant and architect currently writing and teaching at speedemy.com. He's been involved with MySQL since 1999, scaling and optimizing MySQL backed systems since 2004 for companies such as BBC, EngineYard, famous social networks and small shops like EstanteVirtual, Pine Cove and hundreds of others.
Additional content mentioned in the presentation can be found here: http://speedemy.com/17
The catalyst for the success of automobiles came not through the invention of the car but rather through the establishment of an innovative assembly line. History shows us that the ability to mass produce and distribute a product is the key to driving adoption of any innovation, and machine learning is no different. MLOps is the assembly line of Machine Learning and in this presentation we will discuss the core capabilities your organization should be focused on to implement a successful MLOps system.
Introduction to Google BigQuery. Slides used at the first GDG Cloud meetup in Brussels, about big data on Google Cloud Platform. (http://www.meetup.com/GDG-Cloud-Belgium/events/228206131)
Monitoring Your AWS EKS Environment with DatadogDevOps.com
Join Datadog for a webinar on monitoring Kubernetes with a focus on Amazon EKS. You'll learn how to get the most out of Datadog's intuitive platform and EKS's unique capabilities, including:
How to monitor metrics, logs and traces from your EKS environment
How to test the usability of your environment with features such as adaptive Browser Tests and globally available Real User Monitoring
How to find and fix user-facing issues with synthetic monitoring features like adaptive Browser Tests and globally available Real User Monitoring
In this talk, I present an introduction of MLFlow. I also show some examples of using it by means of MLFlow Tracking, MLFlow Projects and MLFlow Models. I also used Databricks as an example of remote tracking.
Speaker: Jay Runkel, Principal Solution Architect, MongoDB
Session Type: 40 minute main track session
Track: Operations
When architecting a MongoDB application, one of the most difficult questions to answer is how much hardware (number of shards, number of replicas, and server specifications) am I going to need for an application. Similarly, when deploying in the cloud, how do you estimate your monthly AWS, Azure, or GCP costs given a description of a new application? While there isn’t a precise formula for mapping application features (e.g., document structure, schema, query volumes) into servers, there are various strategies you can use to estimate the MongoDB cluster sizing. This presentation will cover the questions you need to ask and describe how to use this information to estimate the required cluster size or cloud deployment cost.
What You Will Learn:
- How to architect a sharded cluster that provides the required computing resources while minimizing hardware or cloud computing costs
- How to use this information to estimate the overall cluster requirements for IOPS, RAM, cores, disk space, etc.
- What you need to know about the application to estimate a cluster size
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus
Looking to implement MLOps using AWS services and Kubeflow? Come and learn about machine learning from the experts of Provectus and Amazon Web Services (AWS)!
Businesses recognize that machine learning projects are important but go beyond just building and deploying models, which is mostly done by organizations. Successful ML projects entail a complete lifecycle involving ML, DevOps, and data engineering and are built on top of ML infrastructure.
AWS and Amazon SageMaker provide a foundation for building infrastructure for machine learning while Kubeflow is a great open source project, which is not given enough credit in the AWS community. In this webinar, we show how to design and build an end-to-end ML infrastructure on AWS.
Agenda
- Introductions
- Case Study: GoCheck Kids
- Overview of AWS Infrastructure for Machine Learning
- Provectus ML Infrastructure on AWS
- Experimentation
- MLOps
- Feature Store
Intended Audience
Technology executives & decision makers, manager-level tech roles, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Qingwei Li, ML Specialist Solutions Architect, AWS
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/webinar-mlops-and-reproducible-ml-on-aws-with-kubeflow-and-sagemaker-aug-2020/
BigQuery ML - Machine learning at scale using SQLMárton Kodok
With BigQuery ML, you can build machine learning models without leaving the data warehouse environment and training it on massive datasets. We are going to demonstrate how to build, train, eval and predict, your own scalable machine learning models using standard SQL language in Google BigQuery.
We will see how can we use CREATE MODEL sql syntax to build different models such as:
Linear regression
Multiclass logistic regression for classification
K-means clustering
Import TensorFlow models for prediction in BigQuery
We will see how we can apply these models on tabular data in retail and marketing use cases.
Models are trained and accessed in BigQuery using SQL — a language data analysts know. This enables business decision making through predictive analytics across the organization without leaving the query editor.
Making Your Hypothesis Work Harder to Inform Future Product StrategyOptimizely
At Treatwell, each experiment goes beyond improving a single business metric. Experimentation works to evolve their product while enriching customer insights in order to deliver the best digital experience to their users. Join Laura Howard, Lead Product Manager, and Dennis Meisner, Senior Product Analyst, to learn their secret to making their hypothesis work harder and how getting their hypothesis right has improved Treatwell’s funnel progression and order health, as well as helped them make critical decisions on their product experience.
Product Strategy: Idea to Action by Coinbase Sr Product ManagerProduct School
Product Management Event at #ProductCon San Francisco about Product Strategy: Idea to Action by Senior Product Manager at Coinbase, Anna Marie Clifton.
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML EngineersDaniel Zivkovic
#MLOps is a hot buzzword, just like #DevOps before it. It sparked a gold rush for software vendors, so it's hard to choose the best tool for your needs. Vertex AI is a unified MLOps platform for the entire #AI #workflow on #GoogleCloud. It is the 3rd iteration of the Google Cloud #ML platform (since its original launch), and we think they did it right (this time).
That's why #ServerlessTO invited 2 AI/ML gurus from #GCP (Jarek Kazmierczak & Brian Kang) to introduce the #VertexAI you to.
The lecture recording with Q&A is at https://youtu.be/X1S7360ip-k
MEETUP "CODE-ALONG" RESOURCES
Vertex workbench - Managed and User-managed Notebooks
https://cloud.google.com/vertex-ai/docs/workbench/managed/quickstarts
Example that the training code was based on - Fashion MNIST dataset
https://www.tensorflow.org/tutorials/keras/classification
Hyperparameter tuning codelab
https://codelabs.developers.google.com/vertex_hyperparameter_tuning
Vertex pipeline codelabs
https://codelabs.developers.google.com/vertex-pipelines-intro
https://codelabs.developers.google.com/vertex-pipelines-custom-model
CI/CD slides
https://github.com/shivajid/MLOpsCICD/blob/master/presentation/AI%20Workshop%20Day4.pdf
CI/CD github example
https://github.com/shivajid/MLOpsCICD
Model monitoring example
https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/official/model_monitoring/model_monitoring.ipynb
Best practices for MLOps
https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
https://cloud.google.com/resources/mlops-whitepaper
Official Vertex AI Github repository
https://github.com/GoogleCloudPlatform/vertex-ai-samples/
MEETUP CHAT LINKS
https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/notebook_template.ipynb
https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/master/notebooks/official/custom
https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/master/notebooks/community/sdk
https://cloud.google.com/architecture/ml-on-gcp-best-practices#model-deployment-and-serving
https://www.youtube.com/watch?v=ntBEQdD1IeQ&list=PLd31CCJlr9FrZazLqRg1Lxq7xw9b6VNP6&index=3
PyCaret is an open-source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within minutes in your choice of environment. This talk is a practical demo on how to use PyCaret in your existing workflows and supercharge your data science team’s productivity.
valohai에서 발표한 2021, State of MLOps 2021 survey 자료를 요약하여 정리한 것입니다. 조직내에서 MLOps 와 관련하여 역할과 팀의 규모, 집중하는 영역, 현재 툴링화 하여 사용하고 있는 영역 등에 대한 100명의 응답자 내용을 정리한 것입니다.
QuerySurge - the automated Data Testing solutionRTTS
QuerySurge is the leading Data Testing solution built specifically to automate the testing of Data Warehouses & Big Data. QuerySurge ensures that the data extracted from data sources remains intact in the target data store by analyzing and pinpointing any differences quickly.
And QuerySurge makes it easy for both novice and experienced team members to validate their organization's data quickly through Query Wizards while still allowing power users the flexibility they need.
All with deep dive reporting and data health dashboards that quickly provides you with a holistic view of your project’s data.
Types of Automated Data Testing
--------------------------------------------
QuerySurge provides data testing solutions for all of your automated data testing needs
- Data Warehouse testing & ETL testing
- Big Data (Hadoop, NoSQL) testing
- Data Interface testing
- Data Migration testing
- Database Upgrade testing
FREE TRIAL
www.QuerySurge.com
Complex Analytics with NoSQL Data Store in Real TimeNati Shalom
NOSQL are often limited in the type of queries that they can support due to the distributed nature of the data. In this session we would learn patterns on how we can overcome this limitation and combine multiple query semantics with NoSQL based engines.
We will demonstrate specifically a combination of key/value, SQL like, Document model and Graph based queries as well as more advanced topic such as handling partial update and query through projection. We will also demonstrate how we can create a meshaup between those API's i.e. write fast through Key/Value API and execute complex queries on that same data through SQL query.
- See more at: http://nosql2014.dataversity.net/sessionPop.cfm?confid=81&proposalid=6335#sthash.PNSZi5TJ.dpuf
Introduction to Google BigQuery. Slides used at the first GDG Cloud meetup in Brussels, about big data on Google Cloud Platform. (http://www.meetup.com/GDG-Cloud-Belgium/events/228206131)
Monitoring Your AWS EKS Environment with DatadogDevOps.com
Join Datadog for a webinar on monitoring Kubernetes with a focus on Amazon EKS. You'll learn how to get the most out of Datadog's intuitive platform and EKS's unique capabilities, including:
How to monitor metrics, logs and traces from your EKS environment
How to test the usability of your environment with features such as adaptive Browser Tests and globally available Real User Monitoring
How to find and fix user-facing issues with synthetic monitoring features like adaptive Browser Tests and globally available Real User Monitoring
In this talk, I present an introduction of MLFlow. I also show some examples of using it by means of MLFlow Tracking, MLFlow Projects and MLFlow Models. I also used Databricks as an example of remote tracking.
Speaker: Jay Runkel, Principal Solution Architect, MongoDB
Session Type: 40 minute main track session
Track: Operations
When architecting a MongoDB application, one of the most difficult questions to answer is how much hardware (number of shards, number of replicas, and server specifications) am I going to need for an application. Similarly, when deploying in the cloud, how do you estimate your monthly AWS, Azure, or GCP costs given a description of a new application? While there isn’t a precise formula for mapping application features (e.g., document structure, schema, query volumes) into servers, there are various strategies you can use to estimate the MongoDB cluster sizing. This presentation will cover the questions you need to ask and describe how to use this information to estimate the required cluster size or cloud deployment cost.
What You Will Learn:
- How to architect a sharded cluster that provides the required computing resources while minimizing hardware or cloud computing costs
- How to use this information to estimate the overall cluster requirements for IOPS, RAM, cores, disk space, etc.
- What you need to know about the application to estimate a cluster size
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus
Looking to implement MLOps using AWS services and Kubeflow? Come and learn about machine learning from the experts of Provectus and Amazon Web Services (AWS)!
Businesses recognize that machine learning projects are important but go beyond just building and deploying models, which is mostly done by organizations. Successful ML projects entail a complete lifecycle involving ML, DevOps, and data engineering and are built on top of ML infrastructure.
AWS and Amazon SageMaker provide a foundation for building infrastructure for machine learning while Kubeflow is a great open source project, which is not given enough credit in the AWS community. In this webinar, we show how to design and build an end-to-end ML infrastructure on AWS.
Agenda
- Introductions
- Case Study: GoCheck Kids
- Overview of AWS Infrastructure for Machine Learning
- Provectus ML Infrastructure on AWS
- Experimentation
- MLOps
- Feature Store
Intended Audience
Technology executives & decision makers, manager-level tech roles, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Qingwei Li, ML Specialist Solutions Architect, AWS
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/webinar-mlops-and-reproducible-ml-on-aws-with-kubeflow-and-sagemaker-aug-2020/
BigQuery ML - Machine learning at scale using SQLMárton Kodok
With BigQuery ML, you can build machine learning models without leaving the data warehouse environment and training it on massive datasets. We are going to demonstrate how to build, train, eval and predict, your own scalable machine learning models using standard SQL language in Google BigQuery.
We will see how can we use CREATE MODEL sql syntax to build different models such as:
Linear regression
Multiclass logistic regression for classification
K-means clustering
Import TensorFlow models for prediction in BigQuery
We will see how we can apply these models on tabular data in retail and marketing use cases.
Models are trained and accessed in BigQuery using SQL — a language data analysts know. This enables business decision making through predictive analytics across the organization without leaving the query editor.
Making Your Hypothesis Work Harder to Inform Future Product StrategyOptimizely
At Treatwell, each experiment goes beyond improving a single business metric. Experimentation works to evolve their product while enriching customer insights in order to deliver the best digital experience to their users. Join Laura Howard, Lead Product Manager, and Dennis Meisner, Senior Product Analyst, to learn their secret to making their hypothesis work harder and how getting their hypothesis right has improved Treatwell’s funnel progression and order health, as well as helped them make critical decisions on their product experience.
Product Strategy: Idea to Action by Coinbase Sr Product ManagerProduct School
Product Management Event at #ProductCon San Francisco about Product Strategy: Idea to Action by Senior Product Manager at Coinbase, Anna Marie Clifton.
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML EngineersDaniel Zivkovic
#MLOps is a hot buzzword, just like #DevOps before it. It sparked a gold rush for software vendors, so it's hard to choose the best tool for your needs. Vertex AI is a unified MLOps platform for the entire #AI #workflow on #GoogleCloud. It is the 3rd iteration of the Google Cloud #ML platform (since its original launch), and we think they did it right (this time).
That's why #ServerlessTO invited 2 AI/ML gurus from #GCP (Jarek Kazmierczak & Brian Kang) to introduce the #VertexAI you to.
The lecture recording with Q&A is at https://youtu.be/X1S7360ip-k
MEETUP "CODE-ALONG" RESOURCES
Vertex workbench - Managed and User-managed Notebooks
https://cloud.google.com/vertex-ai/docs/workbench/managed/quickstarts
Example that the training code was based on - Fashion MNIST dataset
https://www.tensorflow.org/tutorials/keras/classification
Hyperparameter tuning codelab
https://codelabs.developers.google.com/vertex_hyperparameter_tuning
Vertex pipeline codelabs
https://codelabs.developers.google.com/vertex-pipelines-intro
https://codelabs.developers.google.com/vertex-pipelines-custom-model
CI/CD slides
https://github.com/shivajid/MLOpsCICD/blob/master/presentation/AI%20Workshop%20Day4.pdf
CI/CD github example
https://github.com/shivajid/MLOpsCICD
Model monitoring example
https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/official/model_monitoring/model_monitoring.ipynb
Best practices for MLOps
https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
https://cloud.google.com/resources/mlops-whitepaper
Official Vertex AI Github repository
https://github.com/GoogleCloudPlatform/vertex-ai-samples/
MEETUP CHAT LINKS
https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/notebook_template.ipynb
https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/master/notebooks/official/custom
https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/master/notebooks/community/sdk
https://cloud.google.com/architecture/ml-on-gcp-best-practices#model-deployment-and-serving
https://www.youtube.com/watch?v=ntBEQdD1IeQ&list=PLd31CCJlr9FrZazLqRg1Lxq7xw9b6VNP6&index=3
PyCaret is an open-source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within minutes in your choice of environment. This talk is a practical demo on how to use PyCaret in your existing workflows and supercharge your data science team’s productivity.
valohai에서 발표한 2021, State of MLOps 2021 survey 자료를 요약하여 정리한 것입니다. 조직내에서 MLOps 와 관련하여 역할과 팀의 규모, 집중하는 영역, 현재 툴링화 하여 사용하고 있는 영역 등에 대한 100명의 응답자 내용을 정리한 것입니다.
QuerySurge - the automated Data Testing solutionRTTS
QuerySurge is the leading Data Testing solution built specifically to automate the testing of Data Warehouses & Big Data. QuerySurge ensures that the data extracted from data sources remains intact in the target data store by analyzing and pinpointing any differences quickly.
And QuerySurge makes it easy for both novice and experienced team members to validate their organization's data quickly through Query Wizards while still allowing power users the flexibility they need.
All with deep dive reporting and data health dashboards that quickly provides you with a holistic view of your project’s data.
Types of Automated Data Testing
--------------------------------------------
QuerySurge provides data testing solutions for all of your automated data testing needs
- Data Warehouse testing & ETL testing
- Big Data (Hadoop, NoSQL) testing
- Data Interface testing
- Data Migration testing
- Database Upgrade testing
FREE TRIAL
www.QuerySurge.com
Complex Analytics with NoSQL Data Store in Real TimeNati Shalom
NOSQL are often limited in the type of queries that they can support due to the distributed nature of the data. In this session we would learn patterns on how we can overcome this limitation and combine multiple query semantics with NoSQL based engines.
We will demonstrate specifically a combination of key/value, SQL like, Document model and Graph based queries as well as more advanced topic such as handling partial update and query through projection. We will also demonstrate how we can create a meshaup between those API's i.e. write fast through Key/Value API and execute complex queries on that same data through SQL query.
- See more at: http://nosql2014.dataversity.net/sessionPop.cfm?confid=81&proposalid=6335#sthash.PNSZi5TJ.dpuf
The Storyteller's Secret: 3 Keys to Mastering Storytelling to Win Hearts and ...Carmine Gallo
Why do some ideas catch on and others don't? Inspired by his new book The Storyteller's Secret, bestselling author and master storyteller Carmine Gallo reveals how some of the most successful TED speakers and business legends use storytelling to win hearts and minds. Find out more about The Storyteller's Secret and download a free chapter at storytellerssecret.com.
We at Revolution Analytics are often asked “What is the best way to learn R?” While acknowledging that there may be as many effective learning styles as there are people we have identified three factors that greatly facilitate learning R. For a quick start:
- Find a way of orienting yourself in the open source R world
- Have a definite application area in mind
- Set an initial goal of doing something useful and then build on it
In this webinar, we focus on data mining as the application area and show how anyone with just a basic knowledge of elementary data mining techniques can become immediately productive in R. We will:
- Provide an orientation to R’s data mining resources
- Show how to use the "point and click" open source data mining GUI, rattle, to perform the basic data mining functions of exploring and visualizing data, building classification models on training data sets, and using these models to classify new data.
- Show the simple R commands to accomplish these same tasks without the GUI
- Demonstrate how to build on these fundamental skills to gain further competence in R
- Move away from using small test data sets and show with the same level of skill one could analyze some fairly large data sets with RevoScaleR
Data scientists and analysts using other statistical software as well as students who are new to data mining should come away with a plan for getting started with R.
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
How To Interview a Data Scientist
Daniel Tunkelang
Presented at the O'Reilly Strata 2013 Conference
Video: https://www.youtube.com/watch?v=gUTuESHKbXI
Interviewing data scientists is hard. The tech press sporadically publishes “best” interview questions that are cringe-worthy.
At LinkedIn, we put a heavy emphasis on the ability to think through the problems we work on. For example, if someone claims expertise in machine learning, we ask them to apply it to one of our recommendation problems. And, when we test coding and algorithmic problem solving, we do it with real problems that we’ve faced in the course of our day jobs. In general, we try as hard as possible to make the interview process representative of actual work.
In this session, I’ll offer general principles and concrete examples of how to interview data scientists. I’ll also touch on the challenges of sourcing and closing top candidates.
How to Become a Data Scientist
SF Data Science Meetup, June 30, 2014
Video of this talk is available here: https://www.youtube.com/watch?v=c52IOlnPw08
More information at: http://www.zipfianacademy.com
Zipfian Academy @ Crowdflower
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on November 20, 2013, at the "IBM Developer Days 2013" in Zurich, Switzerland.
ABSTRACT
There is no question that big data has hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms big data and data science. This presentation gives a professional statistician's view on these terms and illustrates the connection between data science and statistics.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Myths and Mathemagical Superpowers of Data ScientistsDavid Pittman
Some people think data scientists are mythical beings, like unicorns, or they are some sort of nouveau fad that will quickly fade. Not true, says IBM big data evangelist James Kobielus. In this engaging presentation, with artwork created by Angela Tuminello, Kobielus debunks 10 myths about data scientists and their role in analytics and big data. You might also want to read the full blog by Kobielus that spawned this presentation: "Data Scientists: Myths and Mathemagical Superpowers" - http://ibm.co/PqF7Jn
For more information, visit http://www.ibmbigdatahub.com
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
What 'kind of things' does a data scientist do? What are the foundations and principles of data science? What is a Data Product? What does the data science process looks like? Learning from data: Data Modeling or Algorithmic Modeling? - talk by Carlos Somohano @ds_ldn at The Cloud and Big Data: HDInsight on Azure London 25/01/13
Transworld Systems Profit Recovery Programjeff dorsey
We presently serve over 60,000 companies and medical facilities. In the past 5 years we have recovered over 2.4 Billion Dollars for our clients. Our fixed fee approach allows our clients to put more money on your bottom line without giving up a large percentage of your profits.We would welcome a opportunity to serve your business so you don\'t have to make more sales or see more patients to maintain profitability. For a free analysis email me at jeff.dorsey@transworldsystems.com
Nano Dimension’s DragonFly LDM™ System is a one-stop solution for agile hardware development and innovative circuit design across a wide array of industries. It empowers companies to securely control entire development cycles through in-house additive manufacturing of PCBs and non-planar electronics with speed and precision, while reducing R&D costs. With it’s Lights-Out Digital Manufacturing (LDM) printing technology, this is the industry’s only comprehensive manufacturing printing platform for round-the-clock 3D printing of electronic circuitry.
Hadoop & Greenplum: Why Do Such a Thing?Ed Kohlwey
Greenplum is using Hadoop in several interesting ways as part of a larger big data architecture with EMC Greenplum Database (a scale-out MPP SQL database) and EMC Isilon (a scale-out network-attached storage appliance). After a quick introduction of Greenplum Database and Isilon, I list some ways Greenplum is tightly integrating with Hadoop and why we would want to do such a thing. Integration points discussed include: Greenplum Database external tables to seamlessly access data in HDFS, querying HBase tables natively from Greenplum Database, Greenplum Database having its underlying storage on HDFS, and Isilon OneFS as a seamless replacement for HDFS.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
SCRIPT:This diagram depicts the Greenplum Unified Analytics Platform. Let’s take a high level look of what it looks like from a stack diagram. The foundations of UAP lie in Greenplum Database for analyzing your structured data, co-processing unstructured data with Greenplum Hadoop. These two components are fused together by Greenplum gNet, which allows for parallel data exchange and parallel query access. These are overlaid with a unified data access and query layer that combines the languages of choice for your analysts (SQL, MapReduce, Etc.). Over the access layer comes our powerful partner tool and services layer. We are not about locking customers into a single tool or stack. Instead we work with the tool vendor of your choice, be it SAS or R, Microstrategy or informatica. And what truly enables productivity and ensures you are getting maximum value out of your data scientist team is Greenplum Chorus. What sets this diagram apart from a typically vendor example is the inclusion of people – Data Stakeholders. UAP is designed to enable an emerging group of talent, the new practitioners, that we refer to as the Data Science team. This team can include the data platform administrator, data scientist, analysts, engineers, BI teams, and most importantly the line of business user and how they participate on this data science team.We develop, package, and support this as a unified software platform available over your favorite commodity hardware, cloud infrastructure, or from our modular Data Computing Appliance. NOTES: