Presenting at the Microsoft Devs HK Meetup on 13 June, 2018
Code for presentation: https://github.com/sadukie/IntroToPyForCSharpDevs
Azure Notebook for presentation:
https://notebooks.azure.com/cletechconsulting/libraries/introtopyforcsharpdevs
I work in a Data Innovation Lab with a horde of Data Scientists. Data Scientists gather data, clean data, apply Machine Learning algorithms and produce results, all of that with specialized tools (Dataiku, Scikit-Learn, R...). These processes run on a single machine, on data that is fixed in time, and they have no constraint on execution speed.
With my fellow Developers, our goal is to bring these processes to production. Our constraints are very different: we want the code to be versioned, to be tested, to be deployed automatically and to produce logs. We also need it to run in production on distributed architectures (Spark, Hadoop), with fixed versions of languages and frameworks (Scala...), and with data that changes every day.
In this talk, I will explain how we, Developers, work hand-in-hand with Data Scientists to shorten the path to running data workflows in production.
Given at Data Day Texas 2016.
Apache Spark has been hailed as a trail-blazing new tool for doing distributed data science. However, since it's so new, it can be difficult to set up and hard to use. In this talk, I'll discuss the journey I've had using Spark for data science at Bitly over the past year. I'll talk about the benefits of using Spark, the challenges I've had to overcome, the caveats for using a cutting-edge technology such as this, and my hopes for the Spark project as a whole.
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...Databricks
The landscape of security threats an enterprise faces is vast. It is imperative for an organization to know when one of the machines within the network has been compromised. One layer of detection can take advantage of the DNS requests made by machines within the network. A request to a Command & Control (CNC) domain can act as an indication of compromise. It is thus advisable to find these domains before they come into play. The team at Akamai aims to do just that.
In this session, Aminov will share Akamai’s experience in porting their PoC detection algorithms, written in Python, to a reliable production-level implementation using Scala and Apache Spark. He will specifically cover their experience regarding an algorithm they developed to detect botnet domains based on passive DNS data. The session will also include some useful insights Akamai has learned while handing out solutions from research to development, including the transition from small-scale to large-scale data consumption, model export/import using PMML and sampling techniques. This information is valuable for researchers and developers alike.
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...Databricks
The majority of a data scientist’s time is spent cleaning and organizing data before insights can be derived. Frequently, with large datasets, a lack of integration with visualization tools makes it hard to know what’s most interesting in the data and also creates challenges for validating numerical insights from models. Given the vast number of tools available in the ecosystem, it is hard to experiment with different tools to pick the most suitable one, especially given the complexity involved in integrating them with one’s solution.
The speakers will present an easy to use workflow that solves this integration challenge by combining various open source libraries, databases (e.g. Hive, Postgres, MySQL, HBase etc.) and visualization with distributed analytics. Intel developed a highly scalable library built over Apache Spark with novel graph, statistical and machine learning algorithms that also enhances the user experience of Apache Spark via easier to use APIs.
This session will showcase how to address the above mentioned issues for a drug similarity use case. We’ll go from ETL operations on raw drug data to deriving relevant features from the drug’s chemical structure using statistical and graph algorithms, using techniques to identify best model and parameters for this data to derive insights, and then demonstrating the ease of connectivity to different databases and visualization tools.
Apache Spark MLlib's Past Trajectory and New Directions with Joseph BradleyDatabricks
This talk discusses the trajectory of MLlib, the Machine Learning (ML) library for Apache Spark. We will review the history of the project, including major trends and efforts leading up to today. These discussions will provide perspective as we delve into ongoing and future efforts within the community. This talk is geared towards both practitioners and developers and will provide a deeper understanding of priorities, directions and plans for MLlib.
Since the original MLlib project was merged into Apache Spark, some of the most significant efforts have been in expanding algorithmic coverage, adding multiple language APIs, supporting ML Pipelines, improving DataFrame integration, and providing model persistence. At an even higher level, the project has evolved from building a standard ML library to supporting complex workflows and production requirements.
This momentum continues. We will discuss some of the major ongoing and future efforts in Apache Spark based on discussions, planning and development amongst the MLlib community. We (the community) aim to provide pluggable and extensible APIs usable by both practitioners and ML library developers. To take advantage of Projects Tungsten and Catalyst, we are exploring DataFrame-based implementations of ML algorithms for better scaling and performance. Finally, we are making continuous improvements to core algorithms in performance, functionality, and robustness. We will augment this discussion with statistics from project activity.
I work in a Data Innovation Lab with a horde of Data Scientists. Data Scientists gather data, clean data, apply Machine Learning algorithms and produce results, all of that with specialized tools (Dataiku, Scikit-Learn, R...). These processes run on a single machine, on data that is fixed in time, and they have no constraint on execution speed.
With my fellow Developers, our goal is to bring these processes to production. Our constraints are very different: we want the code to be versioned, to be tested, to be deployed automatically and to produce logs. We also need it to run in production on distributed architectures (Spark, Hadoop), with fixed versions of languages and frameworks (Scala...), and with data that changes every day.
In this talk, I will explain how we, Developers, work hand-in-hand with Data Scientists to shorten the path to running data workflows in production.
Given at Data Day Texas 2016.
Apache Spark has been hailed as a trail-blazing new tool for doing distributed data science. However, since it's so new, it can be difficult to set up and hard to use. In this talk, I'll discuss the journey I've had using Spark for data science at Bitly over the past year. I'll talk about the benefits of using Spark, the challenges I've had to overcome, the caveats for using a cutting-edge technology such as this, and my hopes for the Spark project as a whole.
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...Databricks
The landscape of security threats an enterprise faces is vast. It is imperative for an organization to know when one of the machines within the network has been compromised. One layer of detection can take advantage of the DNS requests made by machines within the network. A request to a Command & Control (CNC) domain can act as an indication of compromise. It is thus advisable to find these domains before they come into play. The team at Akamai aims to do just that.
In this session, Aminov will share Akamai’s experience in porting their PoC detection algorithms, written in Python, to a reliable production-level implementation using Scala and Apache Spark. He will specifically cover their experience regarding an algorithm they developed to detect botnet domains based on passive DNS data. The session will also include some useful insights Akamai has learned while handing out solutions from research to development, including the transition from small-scale to large-scale data consumption, model export/import using PMML and sampling techniques. This information is valuable for researchers and developers alike.
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...Databricks
The majority of a data scientist’s time is spent cleaning and organizing data before insights can be derived. Frequently, with large datasets, a lack of integration with visualization tools makes it hard to know what’s most interesting in the data and also creates challenges for validating numerical insights from models. Given the vast number of tools available in the ecosystem, it is hard to experiment with different tools to pick the most suitable one, especially given the complexity involved in integrating them with one’s solution.
The speakers will present an easy to use workflow that solves this integration challenge by combining various open source libraries, databases (e.g. Hive, Postgres, MySQL, HBase etc.) and visualization with distributed analytics. Intel developed a highly scalable library built over Apache Spark with novel graph, statistical and machine learning algorithms that also enhances the user experience of Apache Spark via easier to use APIs.
This session will showcase how to address the above mentioned issues for a drug similarity use case. We’ll go from ETL operations on raw drug data to deriving relevant features from the drug’s chemical structure using statistical and graph algorithms, using techniques to identify best model and parameters for this data to derive insights, and then demonstrating the ease of connectivity to different databases and visualization tools.
Apache Spark MLlib's Past Trajectory and New Directions with Joseph BradleyDatabricks
This talk discusses the trajectory of MLlib, the Machine Learning (ML) library for Apache Spark. We will review the history of the project, including major trends and efforts leading up to today. These discussions will provide perspective as we delve into ongoing and future efforts within the community. This talk is geared towards both practitioners and developers and will provide a deeper understanding of priorities, directions and plans for MLlib.
Since the original MLlib project was merged into Apache Spark, some of the most significant efforts have been in expanding algorithmic coverage, adding multiple language APIs, supporting ML Pipelines, improving DataFrame integration, and providing model persistence. At an even higher level, the project has evolved from building a standard ML library to supporting complex workflows and production requirements.
This momentum continues. We will discuss some of the major ongoing and future efforts in Apache Spark based on discussions, planning and development amongst the MLlib community. We (the community) aim to provide pluggable and extensible APIs usable by both practitioners and ML library developers. To take advantage of Projects Tungsten and Catalyst, we are exploring DataFrame-based implementations of ML algorithms for better scaling and performance. Finally, we are making continuous improvements to core algorithms in performance, functionality, and robustness. We will augment this discussion with statistics from project activity.
Debugging Apache Spark - Scala & Python super happy fun times 2017Holden Karau
Apache Spark is one of the most popular big data projects, offering greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. Holden Karau and Joey Echeverria explore how to debug Apache Spark applications, the different options for logging in Spark’s variety of supported languages, and some common errors and how to detect them.
Spark’s own internal logging can often be quite verbose. Holden and Joey demonstrate how to effectively search logs from Apache Spark to spot common problems and discuss options for logging from within your program itself. Spark’s accumulators have gotten a bad rap because of how they interact in the event of cache misses or partial recomputes, but Holden and Joey look at how to effectively use Spark’s current accumulators for debugging before gazing into the future to see the data property type accumulators that may be coming to Spark in future versions. And in addition to reading logs and instrumenting your program with accumulators, Spark’s UI can be of great help for quickly detecting certain types of problems. Holden and Joey cover how to quickly use the UI to figure out if certain types of issues are occurring in our job.
Big Data Processing with Apache Spark 2014mahchiev
Apache Spark™ is a fast and general engine for large-scale data processing. It has gained enormous popularity recently with its speed and ease of use and is currently replacing traditional Hadoop MapReduce. We'll talk about:
1. What is Big Data ?
2. The Map-Reduce paradigm
3. What does Apache Spark do?
4. Finally, we'll make a quick demo
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleDomino Data Lab
Scala and Spark are each great tools for data processing and they work well together. They can process data via small simple interactive queries as well as in very large highly-available and scalable production systems. They provide an integrated framework for an ever growing wide range of data processing capabilities. We examine the reasons for this and also look a couple of simple data processing examples written in Scala. Presented by John Nestor, Sr Architect at 47 Degrees.
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsightŁukasz Grala
Sesja or ozwiązaniu Big Data Analytics Microsoft. Jest to Hortonowrks (HADOOP, HBase, Storm, Spark), wraz z wydajnym R Server. Zaawansowana analityka przy użyciui RevoScaleR
Whirlpools in the Stream with Jayesh LalwaniDatabricks
At Capital One, we use Spark to detect Fraud. Recently we have started implementing real-time fraud detection using machine learnt models. One of Capital One’s fraud detection micro services was an early adopter of Structured Streaming. As part of this implementation, the micro service ran into several roadblocks. In this talk, we describe those roadblocks and how we got around them.
Caching of lookup data
Dilemma: Store state in Spark vs store state in database
Retrieve state from database efficiently
Non-homogenous data sources
Aggregations in the stream
Checkpointing fumbles
Checkpointing performance and instabilities
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copyUwe Korn
Apache Arrow's promise was to reduce the (serialization & copy) overhead of working with columnar data between different systems. Using the latest Pandas release and Arrow's ability to share memory between the JVM and Python as ingredients, we demonstrate that Arrow can fulfill this bold statement. The performance benefits of this will be shown using a typical data engineering use-case that produces data in the JVM and then passes it on to a Python-based machine learning model.
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueDatabricks
Hyperparameter tuning is critical in model development. And its general form: parameter tuning with an objective function is also widely used in industry. On the other hand, Apache Spark can handle massive parallelism, and Apache Spark ML is a solid machine learning solution.
But we have not seen a general and intuitive distributed parameter tuning solution based on Apache Spark, why?
Not every tuning problem is on Apache Spark ML models. How can Apache Spark handle general models?
Not every tuning problem is a parallelizable grid or random search. Bayesian optimization is sequential, how can Apache Spark help in this case?
Not every tuning problem is single epoch, deep learning is not. How to fit algos such as hyperband and ASHA into Apache Spark?
Not every tuning problem is a machine learning problem, for example simulation + tuning is also common. How to generalize?
In this talk, we are going to show how using Fugue-Tune and Apache Spark together can eliminate these painpoints
Fugue-Tune like Fugue, is a “super framework” – an absraction layer unifying existing solutions such as Hyperopt and Optuna
It firstly models the general tuning problems, independent from machine learning
It is designed for both small and large scale problems. It can always fully parallelize the distributable part of a tuning problem
It works for both classical and deep learning models. With Fugue, running hyperband and ASHA becomes possible on Apache Spark.
In the demo, you will see how to do any type of tuning in a consistent, intuitive, scalable and minimal way. And you will see a live demo of the amazing performance.
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...Databricks
Time is the one thing we can never get in front of. It is rooted in everything, and “timeliness” is now more important than ever especially as we see businesses automate more and more of their processes. This presentation will scratch the surface of streaming discovery with a deeper dive into the telecommunications space where it is normal to receive billions of events a day from globally distributed sub-systems and where key decisions “must” be automated.
We’ll start out with a quick primer on telecommunications, an overview of the key components of our architecture, and make a case for the importance of “ringing”. We will then walk through a simplified solution for doing windowed histogram analysis and labeling of data in flight using Spark Structured Streaming and mapGroupsWithState. I will walk through some suggestions for scaling up to billions of events, managing memory when using the spark StateStore as well as how to avoid pitfalls with the serialized data stored there.
What you’ll learn:
1. How to use the new features of Spark 2.2.0 (mapGroupsWithState / StateStore)
2. How to bucket and analyze data in the streaming world
3. How to avoid common Serialization mistakes (eg. how to upgrade application code and retain stored state)
4. More about the telecommunications space than you’ll probably want to know!
5. Learn a new approach to building applications for enterprise and production.
Assumptions:
1. You know Scala – or want to know more about it.
2. You have deployed spark to production at your company or want to
3. You want to learn some neat tricks that may save you tons of time!
Take Aways:
1. Fully functioning spark app – with unit tests!
Skutil - H2O meets Sklearn - Taylor SmithSri Ambati
Skutil brings the best of both worlds to H2O and sklearn, delivering an easy transition into the world of distributed computing that H2O offers, while providing the same, familiar interface that sklearn users have come to know and love.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Debugging Apache Spark - Scala & Python super happy fun times 2017Holden Karau
Apache Spark is one of the most popular big data projects, offering greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. Holden Karau and Joey Echeverria explore how to debug Apache Spark applications, the different options for logging in Spark’s variety of supported languages, and some common errors and how to detect them.
Spark’s own internal logging can often be quite verbose. Holden and Joey demonstrate how to effectively search logs from Apache Spark to spot common problems and discuss options for logging from within your program itself. Spark’s accumulators have gotten a bad rap because of how they interact in the event of cache misses or partial recomputes, but Holden and Joey look at how to effectively use Spark’s current accumulators for debugging before gazing into the future to see the data property type accumulators that may be coming to Spark in future versions. And in addition to reading logs and instrumenting your program with accumulators, Spark’s UI can be of great help for quickly detecting certain types of problems. Holden and Joey cover how to quickly use the UI to figure out if certain types of issues are occurring in our job.
Big Data Processing with Apache Spark 2014mahchiev
Apache Spark™ is a fast and general engine for large-scale data processing. It has gained enormous popularity recently with its speed and ease of use and is currently replacing traditional Hadoop MapReduce. We'll talk about:
1. What is Big Data ?
2. The Map-Reduce paradigm
3. What does Apache Spark do?
4. Finally, we'll make a quick demo
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleDomino Data Lab
Scala and Spark are each great tools for data processing and they work well together. They can process data via small simple interactive queries as well as in very large highly-available and scalable production systems. They provide an integrated framework for an ever growing wide range of data processing capabilities. We examine the reasons for this and also look a couple of simple data processing examples written in Scala. Presented by John Nestor, Sr Architect at 47 Degrees.
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsightŁukasz Grala
Sesja or ozwiązaniu Big Data Analytics Microsoft. Jest to Hortonowrks (HADOOP, HBase, Storm, Spark), wraz z wydajnym R Server. Zaawansowana analityka przy użyciui RevoScaleR
Whirlpools in the Stream with Jayesh LalwaniDatabricks
At Capital One, we use Spark to detect Fraud. Recently we have started implementing real-time fraud detection using machine learnt models. One of Capital One’s fraud detection micro services was an early adopter of Structured Streaming. As part of this implementation, the micro service ran into several roadblocks. In this talk, we describe those roadblocks and how we got around them.
Caching of lookup data
Dilemma: Store state in Spark vs store state in database
Retrieve state from database efficiently
Non-homogenous data sources
Aggregations in the stream
Checkpointing fumbles
Checkpointing performance and instabilities
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copyUwe Korn
Apache Arrow's promise was to reduce the (serialization & copy) overhead of working with columnar data between different systems. Using the latest Pandas release and Arrow's ability to share memory between the JVM and Python as ingredients, we demonstrate that Arrow can fulfill this bold statement. The performance benefits of this will be shown using a typical data engineering use-case that produces data in the JVM and then passes it on to a Python-based machine learning model.
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueDatabricks
Hyperparameter tuning is critical in model development. And its general form: parameter tuning with an objective function is also widely used in industry. On the other hand, Apache Spark can handle massive parallelism, and Apache Spark ML is a solid machine learning solution.
But we have not seen a general and intuitive distributed parameter tuning solution based on Apache Spark, why?
Not every tuning problem is on Apache Spark ML models. How can Apache Spark handle general models?
Not every tuning problem is a parallelizable grid or random search. Bayesian optimization is sequential, how can Apache Spark help in this case?
Not every tuning problem is single epoch, deep learning is not. How to fit algos such as hyperband and ASHA into Apache Spark?
Not every tuning problem is a machine learning problem, for example simulation + tuning is also common. How to generalize?
In this talk, we are going to show how using Fugue-Tune and Apache Spark together can eliminate these painpoints
Fugue-Tune like Fugue, is a “super framework” – an absraction layer unifying existing solutions such as Hyperopt and Optuna
It firstly models the general tuning problems, independent from machine learning
It is designed for both small and large scale problems. It can always fully parallelize the distributable part of a tuning problem
It works for both classical and deep learning models. With Fugue, running hyperband and ASHA becomes possible on Apache Spark.
In the demo, you will see how to do any type of tuning in a consistent, intuitive, scalable and minimal way. And you will see a live demo of the amazing performance.
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...Databricks
Time is the one thing we can never get in front of. It is rooted in everything, and “timeliness” is now more important than ever especially as we see businesses automate more and more of their processes. This presentation will scratch the surface of streaming discovery with a deeper dive into the telecommunications space where it is normal to receive billions of events a day from globally distributed sub-systems and where key decisions “must” be automated.
We’ll start out with a quick primer on telecommunications, an overview of the key components of our architecture, and make a case for the importance of “ringing”. We will then walk through a simplified solution for doing windowed histogram analysis and labeling of data in flight using Spark Structured Streaming and mapGroupsWithState. I will walk through some suggestions for scaling up to billions of events, managing memory when using the spark StateStore as well as how to avoid pitfalls with the serialized data stored there.
What you’ll learn:
1. How to use the new features of Spark 2.2.0 (mapGroupsWithState / StateStore)
2. How to bucket and analyze data in the streaming world
3. How to avoid common Serialization mistakes (eg. how to upgrade application code and retain stored state)
4. More about the telecommunications space than you’ll probably want to know!
5. Learn a new approach to building applications for enterprise and production.
Assumptions:
1. You know Scala – or want to know more about it.
2. You have deployed spark to production at your company or want to
3. You want to learn some neat tricks that may save you tons of time!
Take Aways:
1. Fully functioning spark app – with unit tests!
Skutil - H2O meets Sklearn - Taylor SmithSri Ambati
Skutil brings the best of both worlds to H2O and sklearn, delivering an easy transition into the world of distributed computing that H2O offers, while providing the same, familiar interface that sklearn users have come to know and love.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
A short introduction to the more advanced python and programming in general. Intended for users that has already learned the basic coding skills but want to have a rapid tour of more in-depth capacities offered by Python and some general programming background.
Execrices are available at: https://github.com/chiffa/Intermediate_Python_programming
Best Python Online Training with Live Project by Expert QA TrainingHub
QA Training Hub is best Python Programing Online Training Center in India. Python Online Training provided by real time working Professional Mr. Dinesh. Data Scientist and RPA Expert with 18+ years of industry experience in teaching Python. Visit: http://www.qatraininghub.com/python-online-training.php Contact: Mr. Dinesh Raju : India: +91-8977262627, USA: : +1-845-493-5018, Mail: info@qatraininghub.com
Working with credentials for Azure resources, you want to avoid storing your credentials in repositories when possible. In this session, we will talk about some of the options for working with credentials in Azure development without checking them into repositories - including managed identities, DefaultAzureCredential, and ChainedTokenCredential.
Databricks is a popular tool used with large amounts of data, applying to many roles - including data analysts, data engineers, data scientists, and machine learning engineers. It can be found on many cloud platforms - including Azure, AWS, and GCP. In this talk, we will look at a flight-themed end-to-end solution using Azure Databricks, Azure Data Factory, Azure Storage, and Power BI. By the end of this session, you will have a better understanding of Databricks' capabilities and how it integrates with other Azure offerings.
Noodling Data with Jupyter Notebook - presented at various user groups in 2020 both in this format and for Azure Notebooks; also available as a Juptyer Notebook to be presented with RISE slideshow
What is UX and why should we care as developers? This talk explores these concepts from a developer's perspective. Presented at Kansas City Developer Conference 2017 on August 4, 2017
Without users & their problems, we have no reason to write software. However, sometimes, it is frustrating dealing with the source of our problems. Thankfully, there are tools to help us become better at communicating with our end users, in hopes of achieving the end goal with as little strife as possible. Empathy, patience, and clear communication go a long way in development, as this talk will show. “Even More Tools for the Developer’s UX Toolbelt” will give developers even more tools to make their lives a little easier when dealing with end users.
Tips & tricks *for developers, by a developer* on how to work with end users and the business, making software development a bit easier.
This was delivered at Link State 2014 at Case Western Reserve University in Cleveland, OH on September 20, 2014.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
2. A C# Dev’s Guide to Python
Presented by:
Sarah Dutkiewicz
Microsoft MVP, Visual Studio and
Development Technologies
Microsoft Developers HK
13 June, 2018
3. About the Presenter
• 9 time Microsoft Most Valuable
Professional – 2 years in Visual C#, 7 years
in Visual Studio and Development Tools
• Bachelor of Science in Computer Science &
Engineering Technology
• Published author of a PowerShell book
• Live coding stream guest on Fritz and
Friends and DevChatter
• Why Hong Kong? #ancestraltrip!
4. The Python Community
Python’s community is vast;
diverse & aims to grow;
Python is Open.
https://www.python.org/community/
5. Diversity in Python
The Python Software Foundation and the global Python
community welcome and encourage participation by
everyone. Our community is based on mutual respect,
tolerance, and encouragement, and we are working to help
each other live up to these principles. We want our
community to be more diverse: whoever you are, and
whatever your background, we welcome you.
https://www.python.org/community/diversity/
6. The Zen of Python,
by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
https://www.python.org/dev/peps/pep-0020
7. Areas Using Python
• Analysis
• Computation
• Math
• Science
• Statistics
• Engineering
• Deep Learning
• Artificial Intelligence
• Machine Learning
• Data Science
8. Weakness of Python
• The Great Python Schism
• Hard version split between 2.x and 3.x
• Some people are stuck on 2.x due to dependencies following the 2.x line
• Greatly opinionated
There should be one-- and preferably only one --obvious way to do it.
9. Why Python 3?
• Python 2 struggles with text and binary data
• ‘abcd’ is both a string consisting of letters (textual) and a string consisting of
bytes (binary)
• Goes against the “preferably one way” part of the Zen of Python
• Doesn’t do well with Unicode
• Python was out before Unicode was a standard
• Not all projects in Python 2 support Unicode equally
• Python 3
• unicode/str/bytes types
• Backwards-incompatible – but very much necessary, as Python is a language
of the world
10. Python Enhancement Proposals (PEPs)
• https://www.python.org/dev/peps/
• Purpose and Guidelines for PEPs
• Guidelines for Language Evolution
• Deprecation of Standard Modules
• Bug Fixes
• Style Guides
• Docstring Conventions
• API for crypto
• API for Python database
• Python release schedules
• … and more!
11. Other Terms…
• Benevolent Dictator for Life (BDFL) – Guido van
Rossum, father of Python
• Pythonista – Python developer
• Pythonic – code follows common guidelines, written
in idiomatic Python
• Pythoneer – pioneers of Python, leaders who create
change
• A Pythoneer can be a Pythonista, but not all
Pythonistas are Pythoneers.
12. Some Tools to Know
• Visual Studio with Python Tools
• Visual Studio Code
• Azure Notebooks
• Repl.it
• Jupyter Notebooks
• PyCharm
13. Package Management
• Think NuGet only for Python
• Pip (Pip Installs Packages)
• Python’s official package manager
• Virtualenv
• Install pip packages in an isolated manner
• Conda – conda.io
• Not Python-specific – a cross-platform option similar
to apt and yum
• Part of Miniconda
• Just conda and its dependencies
• Also part of Anaconda
• Conda, its dependencies, and many packages helpful in
data science applications
• More than one? Isn’t this anti-Zen? Yes,
but…
http://jakevdp.github.io/blog/2016/08/25/co
nda-myths-and-misconceptions/
14. Presentation Breakdown
• Simple Python demos in a Jupyter Notebook – to be shared in
an Azure Notebook
• Variables
• Conditional Structures
• Loops
• Functions
• Exception Handling
• Azure Notebook Library:
https://notebooks.azure.com/cletechconsulting/libraries/
introtopyforcsharpdevs
• More complex code using Visual Studio Community Edition
with the Python Tools and/or Visual Studio Code
• GitHub repo:
https://github.com/sadukie/IntroToPyForCSharpDevs
15. What version of Python am I running?
Location Command
Command-line python -V
Within a Python
environment
import sys
sys.version
16. Key Points from Python Style Guide (PEP 8)
• Indentation – 4 spaces
• Optional for continuation lines
• Make it readable and clearly identifiable
• If tabs are already in use, continue with tabs
• Do not mix tabs and spaces!
• Maximum line length should be 79 characters
• Easy for side-by-side files
• Works well for code review situations
• Docstrings and comments should be limited to 72 characters
• Imports on separate lines, always at the top
• Be consistent with quoting – single-quoted and double-quoted strings are the same.
• Read more at https://www.python.org/dev/peps/pep-0008/#imports
17. DEMO: Basics of Python
If Internet is present: Azure Notebooks
If Internet is not present: Jupyter Notebooks
19. Object Orientation
• Object oriented from the beginning
• Classes with:
• Data members (class variables and instance variables)
• Methods
• Class Variables vs Instance Variables
• Class variables are accessed for all instances of a class
• Within a class, outside of methods
• Not common
• Instance variables are managed by the instance
20. Inheritance
• Can inherit from multiple classes
• Can check relationships with isinstance() and issubclass()
• Parent class is accessed via super() method call
• Typical to call parent’s __init__() from within child’s __init__() before
moving on in the child’s initialization method
• Child knows about parents through its __bases__attribute
21. Interfaces
• Not necessary in Python
• No interface keyword in Python
• Try to invoke a method we expect
• Exception handling
• hasattr checking
• Duck typing
If it talks and walks like a duck, then it is a duck
22. Metaclasses
• Things typically defined in the language specification in other
languages
• Classes’ classes
• Class factories!
• Can be stored in a __metaclass__ attribute
• Can also be declared with metaclass= in the class declaration, following
parameters
• Most classes have the metaclass of type
• Traverse the __class__ tree enough, and you’ll end at type
23. Abstract Base Classes
• ABCs!
• abc module
• ABCMeta metaclass
• Use the pass keyword to not define the method’s body
• Must also use the @abc.abstractmethod decorator
• Can register classes as virtual subclasses of ABCs
• Only useful for categorization
• Does not know anything about its parent – nothing in __bases__
• Can throw errors if methods aren’t implemented
26. Magic Methods
• Key concept to understand for OO Python
• Method names are surrounded by double underscores (“dunders”)
• Sometimes called dunder methods
• Object’s lifespan in magic methods
• __new__ - redefined rarely; used to create new instances; phase 1 of the
constructor
• __init__ - initializer for the class; passed the instance; most commonly used in
Python class definitions
• __del__ - the destructor; no guarantee that __del__ will be executed
30. Desktop Application Development
• Tkinter (“Tk
interface”) – Defacto
GUI creation in
Python for writing
desktop apps based
on Tcl/Tk
• PyQt – Python
package for writing
desktop apps based
on Qt
• If you prefer GTK:
• PyGObject
• pygtk
http://www.pygame.org
https://kivy.org
35. SQL Server 2017 & Machine Learning
• Run Python in the server
• Brings computation to the data
• revoscalepy: https://docs.microsoft.com/en-us/machine-learning-
server/python-reference/revoscalepy/revoscalepy-package
36. Learn More!
• Seminar of Machine Learning in Python – Open Source Hong Kong –
led by Delon Yau, Software Engineer, Microsoft -
https://www.meetup.com/opensourcehk/events/251121245/
• Getting Started with Python in Visual Studio Code:
https://code.visualstudio.com/docs/python/python-tutorial
• Python Tools for Visual Studio:
https://www.visualstudio.com/vs/features/python/
• Python at Microsoft blog:
https://blogs.msdn.microsoft.com/pythonengineering/
Abstract: As technology continues to evolve, our toolset as developers evolves as well. While we can use C# for many things, other languages are growing in popularity in other areas - such as Python being used in AI, ML, and other aspects of data science. In this session, we will see how we do things in Python compared to what we do in C#. Some of the tools we will look at include Anaconda with Visual Studio Code and Visual Studio's Python tooling.