This document provides a summary of a presentation on Python and its role in big data analytics. It discusses Python's origins and growth, key packages like NumPy and SciPy, and new tools being developed by Continuum Analytics like Numba, Blaze, and Anaconda to make Python more performant for large-scale data processing and scientific computing. The presentation outlines Continuum's vision of an integrated platform for data analysis and scientific work in Python.
Making NumPy-style and Pandas-style code faster and run in parallel. Continuum has been working on scaled versions of NumPy and Pandas for 4 years. This talk describes how Numba and Dask provide scaled Python today.
Using Anaconda to light up dark data. My talk given to the Berkeley Institute of Data Science describing Anaconda and the Blaze ecosystem for bringing a virtual analytical database to your data.
With Anaconda (in particular Numba and Dask) you can scale up your NumPy and Pandas stack to many cpus and GPUs as well as scale-out to run on clusters of machines including Hadoop.
With Dask and Numba, you can NumPy-like and Pandas-like code and have it run very fast on multi-core systems as well as at scale on many-node clusters.
Making NumPy-style and Pandas-style code faster and run in parallel. Continuum has been working on scaled versions of NumPy and Pandas for 4 years. This talk describes how Numba and Dask provide scaled Python today.
Using Anaconda to light up dark data. My talk given to the Berkeley Institute of Data Science describing Anaconda and the Blaze ecosystem for bringing a virtual analytical database to your data.
With Anaconda (in particular Numba and Dask) you can scale up your NumPy and Pandas stack to many cpus and GPUs as well as scale-out to run on clusters of machines including Hadoop.
With Dask and Numba, you can NumPy-like and Pandas-like code and have it run very fast on multi-core systems as well as at scale on many-node clusters.
Accelerating Data Analysis of Brain Tissue Simulations with Apache Spark with...Databricks
In the past years, the increasing computational power has made possible larger scientific experiments that have high computing demands, such as brain tissue simulations. In general, larger simulations imply generating larger amounts of data that need to be then analyzed by neuroscientists. Currently, simulation reports are analyzed by neuroscientists with the help of Python scripts, thanks to its programming simplicity and its performance of the NumPy library.
However, this analysis workflow will become unfeasible in the near future, as we foresee a 10x increase of the dataset size in the next year. Therefore, we are exploring how to accelerate data analysis of brain activity simulations with big data technologies, like Spark. In this talk, we will present how we address this challenge: from building RDDs/DataFrames from custom binary files to data queries and transformations to achieve the desired scientific analyses. In order to reach our goals, we have implemented our workflow in five different ways, combining RDDs, DataFrames, different data structures and representations and different data partitioning.
After significant engineering and programming efforts, we would like to share with the community our lessons learned: how Spark features can leverage data analysis in our neuroscience research area and what type of decisions can negatively impact performance. Moreover, we would also like to open a discussion with some critical limitations we have found in Spark applied to our use cases, and how to address them in the future as a joint community effort. In brief, as takeaway messages, we will highlight the suitability of Spark for our data analysis, how data generation can highly impact subsequent data analysis and how the decision of data types and formats can have a significant impact in Spark performance. We will present our experiments run on Cooley, the Argonne National Laboratory (ANL) data analysis cluster.
The amount of data available to us is growing rapidly, but what is required to make useful conclusions out of it?
Outline
1. Different tactics to gather your data
2. Cleansing, scrubbing, correcting your data
3. Running analysis for your data
4. Bring your data to live with visualizations
5. Publishing your data for rest of us as linked open data
PyData: Past, Present Future (PyData SV 2014 Keynote)Peter Wang
From the closing keynoteLook back at the last two years of PyData, discussion about Python's role in the growing and changing data analytics landscape, and encouragement of ways to grow the community
Enabling Python to be a Better Big Data CitizenWes McKinney
These slides are from my talk at the NYC Python Meetup at ODSC Office NYC on February 17, 2016. It discusses Python's architectural challenges to interoperate with the Hadoop ecosystem and how a new project, Apache Arrow, will help.
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Databricks
We’ve all heard that AI is going to become as ubiquitous in the enterprise as the telephone, but what does that mean exactly?
Everyone in IBM has a telephone; and everyone knows how to use her telephone; and yet IBM isn’t a phone company. How do we bring AI to the same standard of ubiquity — where everyone in a company has access to AI and knows how to use AI; and yet the company is not an AI company?
In this talk, we’ll break down the challenges a domain expert faces today in applying AI to real-world problems. We’ll talk about the challenges that a domain expert needs to overcome in order to go from “I know a model of this type exists” to “I can tell an application developer how to apply this model to my domain.”
We’ll conclude the talk with a live demo that show cases how a domain expert can cut through the five stages of model deployment in minutes instead of days using IBM and other open source tools.
10 concepts the enterprise decision maker needs to understand about HadoopDonald Miner
Way too many enterprise decision makers have clouded and uninformed views of how Hadoop works and what it does. Donald Miner offers high-level observations about Hadoop technologies and explains how Hadoop can shift the paradigms inside of an organization, based on his report Hadoop: What You Need To Know—Hadoop Basics for the Enterprise Decision Maker, forthcoming from O’Reilly Media.
After a basic introduction to Hadoop and the Hadoop ecosystem, Donald outlines 10 basic concepts you need to understand to master Hadoop:
Hadoop masks being a distributed system: what it means for Hadoop to abstract away the details of distributed systems and why that’s a good thing
Hadoop scales out linearly: why Hadoop’s linear scalability is a paradigm shift (but one with a few downsides)
Hadoop runs on commodity hardware: an honest definition of commodity hardware and why this is a good thing for enterprises
Hadoop handles unstructured data: why Hadoop is better for unstructured data than other data systems from a storage and computation perspective
In Hadoop, you load data first and ask questions later: the differences between schema-on-read and schema-on-write and the drawbacks this represents
Hadoop is open source: what it really means for Hadoop to be open source from a practical perspective, not just a “feel good” perspective
HDFS stores the data but has some major limitations: an overview of HDFS (replication, not being able to edit files, and the NameNode)
YARN controls everything going on and is mostly behind the scenes: an overview of YARN and the pitfalls of sharing resources in a distributed environment and the capacity scheduler
MapReduce may be getting a bad rap, but it’s still really important: an overview of MapReduce (what it’s good at and bad at and why, while it isn’t used as much these days, it still plays an important role)
The Hadoop ecosystem is constantly growing and evolving: an overview of current tools such as Spark and Kafka and a glimpse of some things on the horizon
Donald Miner will do a quick introduction to Apache Hadoop, then discuss the different ways Python can be used to get the job done in Hadoop. This includes writing MapReduce jobs in Python in various different ways, interacting with HBase, writing custom behavior in Pig and Hive, interacting with the Hadoop Distributed File System, using Spark, and integration with other corners of the Hadoop ecosystem. The state of Python with Hadoop is far from stable, so we'll spend some honest time talking about the state of these open source projects and what's missing will also be discussed.
Outlines the vision and philosophy for Wakari.io with a basic overview of popular python data analysis packages. Most of the talk is conducted in Wakari and is not visible on these slides. 90 minutes for PyData NYC, November 8th 2013.
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
Hadoop or Spark: is it an either-or proposition? An exodus away from Hadoop to Spark is picking up steam in the news headlines and talks! Away from marketing fluff and politics, this talk analyzes such news and claims from a technical perspective.
In practical ways, while referring to components and tools from both Hadoop and Spark ecosystems, this talk will show that the relationship between Hadoop and Spark is not of an either-or type but can take different forms such as: evolution, transition, integration, alternation and complementarity.
Travis Oliphant "Python for Speed, Scale, and Science"Fwdays
Python is sometimes discounted as slow because of its dynamic typing and interpreted nature and not suitable for scale because of the GIL. But, in this talk, I will show how with the help of talented open-source contributors around the world, we have been able to build systems in Python that are fast and scalable to many machines and how this has helped Python take over Science.
Talk given at first OmniSci user conference where I discuss cooperating with open-source communities to ensure you get useful answers quickly from your data. I get a chance to introduce OpenTeams in this talk as well and discuss how it can help companies cooperate with communities.
Keynote talk at PyCon Estonia 2019 where I discuss how to extend CPython and how that has led to a robust ecosystem around Python. I then discuss the need to define and build a Python extension language I later propose as EPython on OpenTeams: https://openteams.com/initiatives/2
Accelerating Data Analysis of Brain Tissue Simulations with Apache Spark with...Databricks
In the past years, the increasing computational power has made possible larger scientific experiments that have high computing demands, such as brain tissue simulations. In general, larger simulations imply generating larger amounts of data that need to be then analyzed by neuroscientists. Currently, simulation reports are analyzed by neuroscientists with the help of Python scripts, thanks to its programming simplicity and its performance of the NumPy library.
However, this analysis workflow will become unfeasible in the near future, as we foresee a 10x increase of the dataset size in the next year. Therefore, we are exploring how to accelerate data analysis of brain activity simulations with big data technologies, like Spark. In this talk, we will present how we address this challenge: from building RDDs/DataFrames from custom binary files to data queries and transformations to achieve the desired scientific analyses. In order to reach our goals, we have implemented our workflow in five different ways, combining RDDs, DataFrames, different data structures and representations and different data partitioning.
After significant engineering and programming efforts, we would like to share with the community our lessons learned: how Spark features can leverage data analysis in our neuroscience research area and what type of decisions can negatively impact performance. Moreover, we would also like to open a discussion with some critical limitations we have found in Spark applied to our use cases, and how to address them in the future as a joint community effort. In brief, as takeaway messages, we will highlight the suitability of Spark for our data analysis, how data generation can highly impact subsequent data analysis and how the decision of data types and formats can have a significant impact in Spark performance. We will present our experiments run on Cooley, the Argonne National Laboratory (ANL) data analysis cluster.
The amount of data available to us is growing rapidly, but what is required to make useful conclusions out of it?
Outline
1. Different tactics to gather your data
2. Cleansing, scrubbing, correcting your data
3. Running analysis for your data
4. Bring your data to live with visualizations
5. Publishing your data for rest of us as linked open data
PyData: Past, Present Future (PyData SV 2014 Keynote)Peter Wang
From the closing keynoteLook back at the last two years of PyData, discussion about Python's role in the growing and changing data analytics landscape, and encouragement of ways to grow the community
Enabling Python to be a Better Big Data CitizenWes McKinney
These slides are from my talk at the NYC Python Meetup at ODSC Office NYC on February 17, 2016. It discusses Python's architectural challenges to interoperate with the Hadoop ecosystem and how a new project, Apache Arrow, will help.
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Databricks
We’ve all heard that AI is going to become as ubiquitous in the enterprise as the telephone, but what does that mean exactly?
Everyone in IBM has a telephone; and everyone knows how to use her telephone; and yet IBM isn’t a phone company. How do we bring AI to the same standard of ubiquity — where everyone in a company has access to AI and knows how to use AI; and yet the company is not an AI company?
In this talk, we’ll break down the challenges a domain expert faces today in applying AI to real-world problems. We’ll talk about the challenges that a domain expert needs to overcome in order to go from “I know a model of this type exists” to “I can tell an application developer how to apply this model to my domain.”
We’ll conclude the talk with a live demo that show cases how a domain expert can cut through the five stages of model deployment in minutes instead of days using IBM and other open source tools.
10 concepts the enterprise decision maker needs to understand about HadoopDonald Miner
Way too many enterprise decision makers have clouded and uninformed views of how Hadoop works and what it does. Donald Miner offers high-level observations about Hadoop technologies and explains how Hadoop can shift the paradigms inside of an organization, based on his report Hadoop: What You Need To Know—Hadoop Basics for the Enterprise Decision Maker, forthcoming from O’Reilly Media.
After a basic introduction to Hadoop and the Hadoop ecosystem, Donald outlines 10 basic concepts you need to understand to master Hadoop:
Hadoop masks being a distributed system: what it means for Hadoop to abstract away the details of distributed systems and why that’s a good thing
Hadoop scales out linearly: why Hadoop’s linear scalability is a paradigm shift (but one with a few downsides)
Hadoop runs on commodity hardware: an honest definition of commodity hardware and why this is a good thing for enterprises
Hadoop handles unstructured data: why Hadoop is better for unstructured data than other data systems from a storage and computation perspective
In Hadoop, you load data first and ask questions later: the differences between schema-on-read and schema-on-write and the drawbacks this represents
Hadoop is open source: what it really means for Hadoop to be open source from a practical perspective, not just a “feel good” perspective
HDFS stores the data but has some major limitations: an overview of HDFS (replication, not being able to edit files, and the NameNode)
YARN controls everything going on and is mostly behind the scenes: an overview of YARN and the pitfalls of sharing resources in a distributed environment and the capacity scheduler
MapReduce may be getting a bad rap, but it’s still really important: an overview of MapReduce (what it’s good at and bad at and why, while it isn’t used as much these days, it still plays an important role)
The Hadoop ecosystem is constantly growing and evolving: an overview of current tools such as Spark and Kafka and a glimpse of some things on the horizon
Donald Miner will do a quick introduction to Apache Hadoop, then discuss the different ways Python can be used to get the job done in Hadoop. This includes writing MapReduce jobs in Python in various different ways, interacting with HBase, writing custom behavior in Pig and Hive, interacting with the Hadoop Distributed File System, using Spark, and integration with other corners of the Hadoop ecosystem. The state of Python with Hadoop is far from stable, so we'll spend some honest time talking about the state of these open source projects and what's missing will also be discussed.
Outlines the vision and philosophy for Wakari.io with a basic overview of popular python data analysis packages. Most of the talk is conducted in Wakari and is not visible on these slides. 90 minutes for PyData NYC, November 8th 2013.
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
Hadoop or Spark: is it an either-or proposition? An exodus away from Hadoop to Spark is picking up steam in the news headlines and talks! Away from marketing fluff and politics, this talk analyzes such news and claims from a technical perspective.
In practical ways, while referring to components and tools from both Hadoop and Spark ecosystems, this talk will show that the relationship between Hadoop and Spark is not of an either-or type but can take different forms such as: evolution, transition, integration, alternation and complementarity.
Travis Oliphant "Python for Speed, Scale, and Science"Fwdays
Python is sometimes discounted as slow because of its dynamic typing and interpreted nature and not suitable for scale because of the GIL. But, in this talk, I will show how with the help of talented open-source contributors around the world, we have been able to build systems in Python that are fast and scalable to many machines and how this has helped Python take over Science.
Talk given at first OmniSci user conference where I discuss cooperating with open-source communities to ensure you get useful answers quickly from your data. I get a chance to introduce OpenTeams in this talk as well and discuss how it can help companies cooperate with communities.
Keynote talk at PyCon Estonia 2019 where I discuss how to extend CPython and how that has led to a robust ecosystem around Python. I then discuss the need to define and build a Python extension language I later propose as EPython on OpenTeams: https://openteams.com/initiatives/2
What is Python? An overview of Python for science.Nicholas Pringle
A brief introduction on the use of Python for scientists. Python is fast becoming a popular programming language for scientists. It is free, open source and constantly improving. Being an easy language to learn, it has a large a community of users. Its many favourable qualities make it the perfect language for scientific collaboration.
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)Amazon Web Services
Algorithmia is a startup with a mission to make state of the art machine learning discoverable by everyone&emdash;they offer the largest algorithm marketplace in the world, with over 2500 algorithms supporting tens of thousands of application developers. Algorithma is the first company to make deep learning, one of the most conceptually difficult areas of computing, accessible to any company via microservices. In this session, you learn how this startup has selected and optimized Amazon EC2 instances for various algorithms (including the latest generation of GPU optimized instances), to create a flexible and scalable platform. They also share their architecture and best practices for getting any computationally-intensive application started quickly.
Large Data Analyze with PyTables,
This presentation has been collected from several other presentations(PyTables presentation).
For more presentation in this field please refer to this link (http://pytables.org/moin/HowToUse#Presentations).
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Geoffrey Fox
“Next Generation Grid – HPC Cloud” proposes a toolkit capturing current capabilities of Apache Hadoop, Spark, Flink and Heron as well as MPI and Asynchronous Many Task systems from HPC. This supports a Cloud-HPC-Edge (Fog, Device) Function as a Service Architecture. Note this "new grid" is focussed on data and IoT; not computing. Use interoperable common abstractions but multiple polymorphic implementations.
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI PipelinesTimothy Spann
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
https://www.aicamp.ai/event/eventdetails/W2024022214
apache nifi
llm
generative ai
gen ai
ml
dl
machine learning
apache kafka
apache flink
postgresql
python
AI Meetup (NYC): GenAI, LLMs, ML and Data
Feb 22, 05:30 PM EST
Welcome to the monthly in-person AI meetup in New York City, in collaboration with Microsoft. Join us for deep dive tech talks on AI, GenAI, LLMs and machine learning, food/drink, networking with speakers and fellow developers
Agenda:
* 5:30pm~6:00pm: Checkin, Food/drink and networking
* 6:00pm~6:10pm: Welcome/community update
* 6:10pm~8:30pm: Tech talks
* 8:30pm: Q&A, Open discussion
Tech Talk: Searching and Reasoning Over Multimedia Data with Vector Databases and LMMs
Speaker: Zain Hasan (Weaviate LinkedIn)
Abstract: In this talk, Zain Hasan will discuss how we can use open-source multimodal embedding models in conjunction with large generative multimodal models that can that can see, hear, read, and feel data(!), to perform cross-modal search(searching audio with images, videos with text etc.) and multimodal retrieval augmented generation (MM-RAG) at the billion-object scale with the help of open source vector databases. I will also demonstrate, with live code demos, how being able to perform this cross-modal retrieval in real-time can enables users to use LLMs that can reason over their enterprise multimodal data. This talk will revolve around how we can scale the usage of multimodal embedding and generative models in production.
Tech Talk: Codeless Generative AI Pipelines
Speaker: Timothy Spann (Cloudera LinkedIn)
Abstract: Join us for an insightful talk on leveraging the power of real-time streaming tools, specifically Apache NiFi, to revolutionize GenAI data engineering. In this session, we’ll explore how the integration of Apache NiFi can automate the entire process of prompt building, making it a seamless and efficient task.
Speakers/Topics:
Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics
Sponsors:
We are actively seeking sponsors to support our community. Whether it is by offering venue spaces, providing food/drink, or cash sponsorship. Sponsors will have the chance to speak at the meetups, receive prominent recognition, and gain exposure to our extensive membership base of 20,000+ local or 300K+ developers worldwide.
Venue:
Microsoft NYC - Times Square, 11 Times Square, New York, NY 10036
Room Name: Central Park West 6501
Community on Slack/Discord
- Event chat: chat and connect with speakers and attendees
- Sharing blogs, events, job openings, projects collaborations
Join Slack (search and join the #newyork channel) | Join Discord
At my first visit to SciPy in Latin America, I was able to review the history of PyData, SciPy, and NumFOCUS, and discuss how to grow its communities and cooperate in the future. I also introduce OpenTeams as a way for open-source contributors to grow their reputation and build businesses.
A lecture given for Stats 285 at Stanford on October 30, 2017. I discuss how OSS technology developed at Anaconda, Inc. has helped to scale Python to GPUs and Clusters.
Python is dominating the fast-growing data-science landscape. This talk provides a foundational overview of the practice of data science and some of the most popular Python libraries for doing data science. It also provides an overview of how Anaconda brings it all together.
Conda is a cross-platform package manager that lets you quickly and easily build environments containing complicated software stacks. It was built to manage the NumPy stack in Python but can be used to manage any complex software dependencies.
Blaze: a large-scale, array-oriented infrastructure for PythonTravis Oliphant
This talk gives a high-level overview of the motivation, design goals, and status of the Blaze project from Continuum Analytics which is a large-scale array object for Python.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
5. Why Python
License
Community
Readable Syntax
Modern Constructs
Batteries Included
Free and Open Source, Permissive License
• Broad and friendly community
• Over 34,000 packages on PyPI
• Commercial Support
• Many conferences (PyData, SciPy, PyCons...)
• Executable pseudo-code
• Can understand and edit code a year later
• Fun to develop
• Use of Indentation
IPython
• Interactive prompt on steroids
• Allows less working memory
• Allows failing quickly for exploration
• List comprehensions
• Iterator protocol and generators
• Meta-programming
• Introspection
• (JIT Compiler and Concurrency)
• Internet (FTP, HTTP, SMTP, XMLRPC)
• Compression and Databases
• Logging, unit-tests
• Glue for other languages
• Distribution has much, much more....
6. Python supports a developer spectrum
DeveloperOccasional Scientist Developer
• Cut and paste
• Modify a few variables
• Call some functions
• Typical Quant or
Engineer who doesn’t
become programmer
• Extend frameworks
• Builds new objects
• Wraps code
• Quant / Engineer with
decent developer skill
• Creates frameworks
• Creates compilers
• Typical CS grad
• Knows multiple
languages
Unique aspect of Python
7. 1999 : Early SciPy emergesDiscussions on the matrix-sig from 1997 to 1999 wanting a complete data analysis
environment: Paul Barrett, Joe Harrington, Perry Greenfield, Paul Dubois, Konrad Hinsen,
and others. Activity in 1998, led to increased interest in 1999.
In response on 15 Jan, 1999, I posted to matrix-sig a list of routines I felt needed to be
present and began wrapping / writing in earnest. On 6 April 1999, I announced I would
be creating this uber-package which eventually became SciPy in 2001.
Gaussian quadrature 5 Jan 1999
cephes 1.0 30 Jan 1999
sigtools 0.40 23 Feb 1999
Numeric docs March 1999
cephes 1.1 9 Mar 1999
multipack 0.3 13 Apr 1999
Helper routines 14 Apr 1999
multipack 0.6 (leastsq, ode, fsolve,
quad)
29 Apr 1999
sparse plan described 30 May 1999
multipack 0.7 14 Jun 1999
SparsePy 0.1 5 Nov 1999
cephes 1.2 (vectorize) 29 Dec 1999
Plotting??
Gist
XPLOT
DISLIN
Gnuplot
Helping with f2py
8. Brief History
Person Package Year
Jim Fulton
Matrix Object
in Python
1994
Jim Hugunin Numeric 1995
Perry Greenfield, Rick
White,Todd Miller Numarray 2001
Travis Oliphant NumPy 2005
9. Community effort many, many others!
• Chuck Harris
• Pauli Virtanen
• Nathaniel Smith
• Warren Weckesser
• Ralf Gommers
• Robert Kern
• David Cournapeau
• Stefan van der Walt
• Jake Vanderplas
• Josef Perktold
• Anne Archibald
• Dag Sverre Seljebotn
• Joe Harrington --- Documentation effort
• Andrew Straw --- www.scipy.org
11. Scientific Stack
NumPy
SciPy Pandas Matplotlib
scikit-learnscikit-image statsmodels
PyTables
OpenCV
Cython
Numba SymPy NumExpr
astropy BioPython GDALPySAL
... many many more ...
12. Now What?
After watching NumPy and SciPy get used all over
Science and Technology (including Finance) --- what
would I do differently?
Blaze
Numba
Conda (Anaconda)
16. We are big backers of NumFOCUS and
organizers of PyData
Spyder
17. How we pay the bills
Enterprise
Python
Scientific
Computing
Data Processing
Data Analysis
Visualisation
Scalable
Computing
• Products
• Training
• Support
• Consulting
20. Python and Science
Python is the “language of Science”
(Lots of R users might disagree)
IPython notebook is quickly becoming the way
scientists communicate about their work
Pandas has recently started converting even R users to
Python
21. The problem of Hadoop
Hadoop wants to be the OS for “big-data”. Advanced
analytics and Hadoop don’t blend well.
Many people (led by hype) use Hadoop when they don’t
need to --- and it slows them down and costs them $$.
Scale up first. Then, scale-out.
http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html
“Don’t use Hadoop --- your data is not that big”
22. Options if you do need Hadoop
• Give Disco a try
• Try a non Java-specific emerging alternative to
HDFS (OrangeFS, GlusterFS, CephFS, Swift)
• Use Python wrapper to HDFS (snakebite,
webHDFS) and interface to map-reduce (luigi,
mrjob, MortarData CPython UDF etc.)
26. The largest data analysis gap is in this
man-machine interface. How can we put
the scientist back in control of his data?
How can we build analysis tools that are
intuitive and that augment the scientist’s
intellect rather than adding to the
intellectual burden with a forest of arcane
user tools? The real challenge is building
this smart notebook that unlocks the data
and makes it easy to capture, organize,
analyze, visualize, and publish.
-- Jim Gray et al, 2005
27. Why Don’t Scientists Use DBs?
• Do not support scientific data types, or access
patterns particular to a scientific problem
• Scientists can handle their existing data volumes
using programming tools
• Once data was loaded, could not manipulate it
with standard/familiar programs
• Poor visualization and plotting integration
• Require an expensive guru to maintain
28. “If one takes the controversial view that HDF,
NetCDF, FITS, and Root are nascent database
systems that provide metadata and portability but
lack non-procedural query analysis, automatic
parallelism, and sophisticated indexing, then one
can see a fairly clear path that integrates these
communities.”
Convergence
30. Continuum key OS technologies
Conda
Browser-based interactive visualization for Python users
Cross-platform package manager (with environments)
Array-oriented Python Compiler for CPUs and GPUs (speed
target is Fortran)Numba
Blaze
Bokeh
CDX
NumPy and Pandas for out-of-core and distributed data (general
data-base execution engine for data-flow subset of Python)
Continuum Data Explorer
Ashiba
New web-app building with only
Python and a little HTML
32. What is Conda
• Full package management (like yum or apt-get)
but cross-platform
• Control over environments (using link farms) ---
better than virtual-env. virtualenv today is like distutils and
setuptools of several years ago (great at first but will end up hating it)
• Architected to be able to manage any packages
(R, Scala, Clojure, Haskell, Ruby, JS)
• SAT solver to manage dependencies
• User-definable repositories
34. Packaging and Distribution Solved
• conda and binstar solve most of the problems that
we have seen people encounter in managing
Python installations (especially in large-scale
institutions).
• They are supported solutions that can remove the
technology pain of managing Python
• Allow focus on software architecture and
separation of components (not just whatever
makes packaging convenient)
35. Anaconda
Free enterprise-ready Python distribution of open-
source tools for large-scale data processing,
predictive analytics, and scientific computing
36. Anaconda Add-Ons (paid-for)
•Revolutionary Python to GPU compiler
•Extends Numba to take a subset of Python
to the GPU (program CUDA in Python)
•CUDA FFT / BLAS interfaces
Fast, memory-efficient Python interface for
SQL databases, NoSQL stores,Amazon S3,
and large data files.
NumPy, SciPy, scikit-learn, NumExpr compiled
against Intel’s Math Kernel Library (MKL)
38. Why Numba?
•Python is too slow for loops
•Most people are not learning C/C++/Fortran today
•Cython is an improvment (but still verbose and
needs C-compiler)
•NVIDIA using LLVM for the GPU
•Many people working with large typed-containers
(NumPy arrays)
•We want to take high-level, tarray-oriented
expressions and compile it to fast code
39. NumPy + Mamba = Numba
LLVM Library
Intel Nvidia AppleAMD
OpenCLISPC CUDA CLANGOpenMP
LLVMPY
Python Function Machine Code
ARM
41. Numba
@jit('void(f8[:,:],f8[:,:],f8[:,:])')
def filter(image, filt, output):
M, N = image.shape
m, n = filt.shape
for i in range(m//2, M-m//2):
for j in range(n//2, N-n//2):
result = 0.0
for k in range(m):
for l in range(n):
result += image[i+k-m//2,j+l-n//2]*filt[k, l]
output[i,j] = result
~1500x speed-up
42. Numba changes the game!
LLVM IR
x86
C++
ARM
PTX
C
Fortran
Python
Numba turns (a subset of) Python into a
“compiled language” as fast as C (but much more
flexible). You don’t have to reach for C/C++
43. Laplace Example
@jit('void(double[:,:], double, double)')
def numba_update(u, dx2, dy2):
nx, ny = u.shape
for i in xrange(1,nx-1):
for j in xrange(1, ny-1):
u[i,j] = ((u[i+1,j] + u[i-1,j]) * dy2 +
(u[i,j+1] + u[i,j-1]) * dx2) / (2*(dx2+dy2))
Adapted from http://www.scipy.org/PerformancePython
originally by Prabhu Ramachandran
@jit('void(double[:,:], double, double)')
def numbavec_update(u, dx2, dy2):
u[1:-1,1:-1] = ((u[2:,1:-1]+u[:-2,1:-1])*dy2 +
(u[1:-1,2:] + u[1:-1,:-2])*dx2) / (2*(dx2+dy2))
44. Results of Laplace example
Version Time Speed Up
NumPy 3.19 1.0
Numba 2.32 1.38
Vect. Numba 2.33 1.37
Cython 2.38 1.34
Weave 2.47 1.29
Numexpr 2.62 1.22
Fortran Loops 2.30 1.39
Vect. Fortran 1.50 2.13
https://github.com/teoliphant/speed.git
45. LLVMPy worth looking at
LLVM (via
LLVMPy)
has done
much heavy
lifting
LLVMPy =
Compilers for
everybody
47. Blaze Objectives
• Flexible descriptor for tabular and semi-structured data
• Seamless handling of:
• On-disk / Out of core
• Streaming data
• Distributed data
• Uniform treatment of:
• “arrays of structures” and
“structures of arrays”
• missing values
• “ragged” shapes
• categorical types
• computed columns
48. Blaze Deferred Arrays
+"
A" *"
B" C"
A + B*C
• Symbolic objects which build a graph
• Represents deferred computation
Usually what you have when
you have a Blaze Array
51. Progress
• Basic calculations work out-of-core (via Numba and
LLVM)
• Hard dependency on dynd and dynd-python (a dynamic
C++-only multi-dimensional library like NumPy but with
many improvements)
• Persistent arrays from BLZ
• Basic array-server functionality for layering over CSV
files
• 0.2 release in 1-2 weeks. 0.3 within a month after that
(first usable release)
52. Querying BLZ
In [15]: from blaze import blz
In [16]: t = blz.open("TWITTER_LOG_Wed_Oct_31_22COLON22COLON28_EDT_2012-lvl9.blz")
In [17]: t['(latitude>7) & (latitude<10) & (longitude >-10 ) & (longitude < 10) '] # query
Out[17]:
array([ (263843037069848576L, u'Cossy set to release album:http://t.co/Nijbe9GgShared via
Nigeria News for Android. @', datetime.datetime(2012, 11, 1, 3, 20, 56), 'moses_peleg', u'kaduna',
9.453095, 8.0125194, ''),
...
dtype=[('tid', '<u8'), ('text', '<U140'), ('created_at', '<M8[us]'), ('userid', 'S16'), ('userloc', '<U64'),
('latitude', '<f8'), ('longitude', '<f8'), ('lang', 'S2')])
In [18]: t[1000:3000] # get a range of tweets
Out[18]:
array([ (263829044892692480L, u'boa noite? ;( ue058ue41d', datetime.datetime(2012, 11, 1, 2,
25, 20), 'maaribeiro_', u'', nan, nan, ''),
(263829044875915265L, u"Nah but I'm writing a gym journal... Watch it last 2 days!",
datetime.datetime(2012, 11, 1, 2, 25, 20), 'Ryan_Shizzle', u'Shizzlesville', nan, nan, ''),
...
53. Kiva:Array Server
DataShape + Raw JSON = Web Service
type KivaLoan = {
id: int64;
name: string;
description: {
languages: var, string(2);
texts: json # map<string(2), string>;
};
status: string; # LoanStatusType;
funded_amount: float64;
basket_amount: json; # Option(float64);
paid_amount: json; # Option(float64);
image: {
id: int64;
template_id: int64;
};
video: json;
activity: string;
sector: string;
use: string;
delinquent: bool;
location: {
country_code: string(2);
country: string;
town: json; # Option(string);
geo: {
level: string; # GeoLevelType
pairs: string; # latlong
type: string; # GeoTypeType
}
};
....
{"id":200533,"name":"Miawand Group","description":{"languages":
["en"],"texts":{"en":"Ozer is a member of the Miawand Group. He lives in
the 16th district of Kabul, Afghanistan. He lives in a family of eight
members. He is single, but is a responsible boy who works hard and
supports the whole family. He is a carpenter and is busy working in his
shop seven days a week. He needs the loan to purchase wood and
needed carpentry tools such as tape measures, rulers and so on.rn r
nHe hopes to make progress through the loan and he is confident that
will make his repayments on time and will join for another loan cycle as
well. rnrn"}},"status":"paid","funded_amount":
925,"basket_amount":null,"paid_amount":925,"image":{"id":
539726,"template_id":
1},"video":null,"activity":"Carpentry","sector":"Construction","use":"He
wants to buy tools for his carpentry shop","delinquent":null,"location":
{"country_code":"AF","country":"Afghanistan","town":"Kabul
Afghanistan","geo":{"level":"country","pairs":"33
65","type":"point"}},"partner_id":
34,"posted_date":"2010-05-13T20:30:03Z","planned_expiration_date":
null,"loan_amount":
925,"currency_exchange_loss_amount":null,"borrowers":
[{"first_name":"Ozer","last_name":"","gender":"M","pictured":true},
{"first_name":"Rohaniy","last_name":"","gender":"M","pictured":true},
{"first_name":"Samem","last_name":"","gender":"M","pictured":true}],"ter
ms":
{"disbursal_date":"2010-05-13T07:00:00Z","disbursal_currency":"AFN","
disbursal_amount":42000,"loan_amount":925,"local_payments":
[{"due_date":"2010-06-13T07:00:00Z","amount":4200},
{"due_date":"2010-07-13T07:00:00Z","amount":4200},
{"due_date":"2010-08-13T07:00:00Z","amount":4200},
{"due_date":"2010-09-13T07:00:00Z","amount":4200},
{"due_date":"2010-10-13T07:00:00Z","amount":4200},
{"due_date":"2010-11-13T08:00:00Z","amount":4200},
{"due_date":"2010-12-13T08:00:00Z","amount":4200},
{"due_date":"2011-01-13T08:00:00Z","amount":4200},
{"due_date":"2011-02-13T08:00:00Z","amount":4200},
{"due_date":"2011-03-13T08:00:00Z","amount":
4200}],"scheduled_payments": ...
2.9gb of JSON => network-queryable array: ~5
minutes Kiva Array Server Demo
55. Bokeh Plotting Library
• Interactive graphics for the web
• Designed for large datasets
• Designed for streaming data
• Native interface in Python
• Fast JavaScript component
• DARPA funded
• v0.1 release imminent
56. Reasons for Bokeh
1. Plotting must happen near the data too
2. Quick iteration is essential => interactive visualization
3. Interactive visualization on remote-data => use the browser
4. Almost all web plotting libraries are either:
1. Designed for javascript programmers
2. Designed to output static graphs
5. We designed Bokeh to be dynamic graphing in the web for
Python programmers
6. Will include “Abstract” or “synthetic” rendering (working on
Hadoop and Spark compatibility)
59. Abstract Rendering
Basic AR can identify trouble spots in standard plots, and also
offer automatic tone mapping, taking perception into
account.
37 mil elements, showing adjacency between entities in Kiva dataset
60. Wakari
• Browser-based data analysis and
visualization platform
• Wordpress /YouTube / Github
for data analysis
• Full Linux environment with
Anaconda Python
• Can be installed on internal
clusters & servers
61. Why Wakari?
• Data is too big to fit on your desktop
• You need compute power but don’t have easy access to a
large cluster (cloud is sitting there with lots of power)
• Configuration of software on a new system stinks
(especially a cluster).
• Collaborative Data Analytics --- you want to build a
complex technical workflow and then share it with others
easily (without requiring they do painful configuration to
see your results)
• IPython Notebook is awesome --- let’s share it (but we
also need the dependencies and data).
62. Wakari
• Free account has 512 MB RAM / 2 GB disk and shared
multi-core CPU
• Easily spin-up map-reduce (Disco and Hadoop clusters)
• Use IPython Parallel on many-nodes in the cloud
• Develop GUI apps (possibly in Anaconda) and publish
them easily to Wakari (based on full power of scientific
python --- complex technical workflows (IPython
notebook for now)
64. Continuum Data Explorer (CDX)
• Open Source
• Goal is interactivity
• Combination of IPython REPL, Bokeh, and tables
• Tight integration between GUI elements and REPL
• Current features
- Namespace viewer (mapped to IPython namespace)
- DataTable widget with group-by, computed columns, advanced-
filters
- Interactive Plots connected to tables
66. Conclusion
Projects circle around giving tools to experts
(occasional programmers or domain
experts) to enable them to move their
expertise to the data to get insights --- keep
data where it is and move high-level but
performant code)
Join us or ask how we can help you!