n this talk Francesc Campoy, VP of Developer Realtions at source{d}, will showcase the source{d} Engine: source{d}’s solution for data extraction from large sets of git repositories.
He will introduce the field Code as Data and live demo the kind of insights one can extract from a large codebase with the help of SQL, language classification, and program parsing and token extraction. Expect to see some SQL, lots of cool graphs, and tons of data.
Speaker: Francesc Campoy
Speaker Bio:
Francesc Campoy Flores is the VP of Developer Relations at source{d}, a startup applying ML to source code and building the platform for the future of developer tooling. Previously, he worked at Google as a Developer Advocate for Google Cloud Platform and the Go team.
n this talk Francesc Campoy, VP of Developer Realtions at source{d}, will showcase the source{d} Engine: source{d}’s solution for data extraction from large sets of git repositories.
He will introduce the field Code as Data and live demo the kind of insights one can extract from a large codebase with the help of SQL, language classification, and program parsing and token extraction. Expect to see some SQL, lots of cool graphs, and tons of data.
Speaker: Francesc Campoy
Speaker Bio:
Francesc Campoy Flores is the VP of Developer Relations at source{d}, a startup applying ML to source code and building the platform for the future of developer tooling. Previously, he worked at Google as a Developer Advocate for Google Cloud Platform and the Go team.
This talk is about building Audi's big data platform from a first Hadoop PoC to a multi-tenant enterprise platform. Why a big data platform at all? We explain the requirements that drove the development of this platform and explain the decisions we had to make during this journey.
During the process of setting up our big data infrastructure, we often had to find the right balance between going for enterprise integration versus speed. For instance, whether to use the existing Active Directory for both LDAP and KDC versus setting up our own KDC. Using a shared enterprise service like Active Directory requires to follow certain naming conventions and restricted access, where running our own KDC brings much more flexibility but also adds another component to maintain to our platform. We show the advantages and disadvantages and explain why we've decided to choose a certain approach.
For data ingestion of both batch and streaming data, we use Apache Kafka. We explain why we installed a separate Kafka cluster from our Hadoop platform. We discuss the pros and cons of using the Kafka binary protocol and the HTTP REST protocol not only from a technical perspective but also from the organisational perspective as the source systems are required to push data into Kafka.
We give an overview of our current architecture including how some use cases are implemented on it. Some of them run exclusively on our new big data stack, while others use it in conjunction with our data warehouse. The use cases cover all different kinds of data from sensory data of robots in our plants to click streams from web applications.
Building an enterprise platform does not only consist of technical tasks but also of organizational tasks: data ownership, authorization to access certain data sets, or more financial ones like internal pricing and SLAs.
Although we have already achieved quite a lot, our journey has not yet ended. There are still some open topics to address, like providing a unified logging solution for applications spanning multiple platforms. Or finally offering a notebook-like Zeppelin to our analysts. Or addressing legal issues like GDPR.
We will conclude our talk with a short glimpse into our ongoing extension of our on-premises platform into a hybrid cloud platform.
Speakers
Carsten Herbe, Big Data Architect, Audi AG
Matthias Graunitz, Big Data Architect, Audi AG
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)DataWorks Summit
Most organizations today implement different data stores to support business operations. As a result, data ends up stored across a multitude of often heterogenous systems, like RDBMS, NoSQL, data warehouses, data marts, Hadoop, etc., with limited interaction and/or interoperability between them. The end result is often a vast eco-system of data stores with different "temperature" data, some level of duplication and, no effective way of bringing it all together for business analytics. With such disparate data, how can an organization exploit the wealth of information? This opens up the need for proven techniques to quickly and easily deliver the data to the people who need it. In this session, you'll see how to modernize your enterprise by making data accessible with enterprise capabilities like querying using SQL, granular security for data access, and maintaining high query performance and high concurrency.
"You don't need a bigger boat": serverless MLOps for reasonable companiesData Science Milan
It is indeed a wonderful time to build machine learning systems, as the growing ecosystems of tools and shared best practices make even small teams incredibly productive at scale. In this talk, we present our philosophy for modern, no-nonsense data pipelines, highlighting the advantages of a (almost) pure serverless and open-source approach, and showing how the entire toolchain works - from raw data to model serving - on a real-world dataset.
Finally, we argue that the crucial component for analyzing data pipelines is not the model per se, but the surrounding DAG, and present our proposal for producing automated "DAG cards" from Metaflow classes.
Bio:
Jacopo Tagliabue was co-founder and CTO of Tooso, an A.I. company in San Francisco acquired by Coveo in 2019. Jacopo is currently the Lead A.I. Scientist at Coveo. When not busy building A.I. products, he is exploring research topics at the intersection of language, reasoning and learning, with several publications at major conferences (e.g. WWW, SIGIR, RecSys, NAACL). In previous lives, he managed to get a Ph.D., do scienc-y things for a pro basketball team, and simulate a pre-Columbian civilization.
Topics: MLOps, Metaflow, model cards.
Microsoft Technologies for Data Science 201612Mark Tabladillo
Delivered to SQL Saturday BI Edition -- Atlanta, GA
Microsoft provides several technologies in and around Azure which can be used for casual to serious data science. This presentation provides an overview of the major Microsoft options for both on-premise and cloud-based data science (and hybrid). These technologies have been used by the presenter in various companies and industries, both as a Microsoft consultant and previously independent consultant. As well, the speaker provides insights into data science careers, information which helps imply where the business will likely be for consultants and partners.
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis SystemsJeff Hung
There are different kinds of ways to query things. Relational (algebra), Search (engine), and Streaming (processing) are the common three. Beyond them, Graph Query is more complex and resource consuming. So that there is no commonly accepted system available for graph query currently - especially when you have petabytes of data.
In this talk, we will share how Trend Micro built the in-house graph query system for petabytes of data. Even more, together with big-data, this system can also query existing REST-based services and live analysis system at the same time. This enables researchers in Trend Micro to get the latest intelligence for threat analysis and Machine Learning modeling.
We help our clients to develop their business
We have over 14 years of experience in IT outsourcing. We provide optimal IT solutions that support companies and institutions to achieve superior business goals.
See how we help our customers.
Unlocking Engineering Observability with advanced IT analyticssource{d}
In this webinar, source{d} CEO Eiso Kant will introduce source{d} Enterprise Edition (EE), the data platform for the software development life cycle (SDLC), With built-in visualization, management capabilities and advanced analytic functions, source{d} EE provide IT executives with visibility into their software portfolio, engineering processes and workforce.
Learn how source{d} EE can help everyone in the IT organization to quickly get access to customizable analytic solutions for IT modernization and software compliance, cloud-native and DevOps transformation, engineering effectiveness, and talent management.
This talk is about building Audi's big data platform from a first Hadoop PoC to a multi-tenant enterprise platform. Why a big data platform at all? We explain the requirements that drove the development of this platform and explain the decisions we had to make during this journey.
During the process of setting up our big data infrastructure, we often had to find the right balance between going for enterprise integration versus speed. For instance, whether to use the existing Active Directory for both LDAP and KDC versus setting up our own KDC. Using a shared enterprise service like Active Directory requires to follow certain naming conventions and restricted access, where running our own KDC brings much more flexibility but also adds another component to maintain to our platform. We show the advantages and disadvantages and explain why we've decided to choose a certain approach.
For data ingestion of both batch and streaming data, we use Apache Kafka. We explain why we installed a separate Kafka cluster from our Hadoop platform. We discuss the pros and cons of using the Kafka binary protocol and the HTTP REST protocol not only from a technical perspective but also from the organisational perspective as the source systems are required to push data into Kafka.
We give an overview of our current architecture including how some use cases are implemented on it. Some of them run exclusively on our new big data stack, while others use it in conjunction with our data warehouse. The use cases cover all different kinds of data from sensory data of robots in our plants to click streams from web applications.
Building an enterprise platform does not only consist of technical tasks but also of organizational tasks: data ownership, authorization to access certain data sets, or more financial ones like internal pricing and SLAs.
Although we have already achieved quite a lot, our journey has not yet ended. There are still some open topics to address, like providing a unified logging solution for applications spanning multiple platforms. Or finally offering a notebook-like Zeppelin to our analysts. Or addressing legal issues like GDPR.
We will conclude our talk with a short glimpse into our ongoing extension of our on-premises platform into a hybrid cloud platform.
Speakers
Carsten Herbe, Big Data Architect, Audi AG
Matthias Graunitz, Big Data Architect, Audi AG
Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)DataWorks Summit
Most organizations today implement different data stores to support business operations. As a result, data ends up stored across a multitude of often heterogenous systems, like RDBMS, NoSQL, data warehouses, data marts, Hadoop, etc., with limited interaction and/or interoperability between them. The end result is often a vast eco-system of data stores with different "temperature" data, some level of duplication and, no effective way of bringing it all together for business analytics. With such disparate data, how can an organization exploit the wealth of information? This opens up the need for proven techniques to quickly and easily deliver the data to the people who need it. In this session, you'll see how to modernize your enterprise by making data accessible with enterprise capabilities like querying using SQL, granular security for data access, and maintaining high query performance and high concurrency.
"You don't need a bigger boat": serverless MLOps for reasonable companiesData Science Milan
It is indeed a wonderful time to build machine learning systems, as the growing ecosystems of tools and shared best practices make even small teams incredibly productive at scale. In this talk, we present our philosophy for modern, no-nonsense data pipelines, highlighting the advantages of a (almost) pure serverless and open-source approach, and showing how the entire toolchain works - from raw data to model serving - on a real-world dataset.
Finally, we argue that the crucial component for analyzing data pipelines is not the model per se, but the surrounding DAG, and present our proposal for producing automated "DAG cards" from Metaflow classes.
Bio:
Jacopo Tagliabue was co-founder and CTO of Tooso, an A.I. company in San Francisco acquired by Coveo in 2019. Jacopo is currently the Lead A.I. Scientist at Coveo. When not busy building A.I. products, he is exploring research topics at the intersection of language, reasoning and learning, with several publications at major conferences (e.g. WWW, SIGIR, RecSys, NAACL). In previous lives, he managed to get a Ph.D., do scienc-y things for a pro basketball team, and simulate a pre-Columbian civilization.
Topics: MLOps, Metaflow, model cards.
Microsoft Technologies for Data Science 201612Mark Tabladillo
Delivered to SQL Saturday BI Edition -- Atlanta, GA
Microsoft provides several technologies in and around Azure which can be used for casual to serious data science. This presentation provides an overview of the major Microsoft options for both on-premise and cloud-based data science (and hybrid). These technologies have been used by the presenter in various companies and industries, both as a Microsoft consultant and previously independent consultant. As well, the speaker provides insights into data science careers, information which helps imply where the business will likely be for consultants and partners.
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis SystemsJeff Hung
There are different kinds of ways to query things. Relational (algebra), Search (engine), and Streaming (processing) are the common three. Beyond them, Graph Query is more complex and resource consuming. So that there is no commonly accepted system available for graph query currently - especially when you have petabytes of data.
In this talk, we will share how Trend Micro built the in-house graph query system for petabytes of data. Even more, together with big-data, this system can also query existing REST-based services and live analysis system at the same time. This enables researchers in Trend Micro to get the latest intelligence for threat analysis and Machine Learning modeling.
We help our clients to develop their business
We have over 14 years of experience in IT outsourcing. We provide optimal IT solutions that support companies and institutions to achieve superior business goals.
See how we help our customers.
Unlocking Engineering Observability with advanced IT analyticssource{d}
In this webinar, source{d} CEO Eiso Kant will introduce source{d} Enterprise Edition (EE), the data platform for the software development life cycle (SDLC), With built-in visualization, management capabilities and advanced analytic functions, source{d} EE provide IT executives with visibility into their software portfolio, engineering processes and workforce.
Learn how source{d} EE can help everyone in the IT organization to quickly get access to customizable analytic solutions for IT modernization and software compliance, cloud-native and DevOps transformation, engineering effectiveness, and talent management.
A changing market landscape and open source innovations are having a dramatic impact on the consumability and ease of use of data science tools. Join this session to learn about the impact these trends and changes will have on the future of data science. If you are a data scientist, or if your organization relies on cutting edge analytics, you won't want to miss this!
A Tight Ship: How Containers and SDS Optimize the EnterpriseEric Kavanagh
The Briefing Room with Dez Blanchfield and Red Hat
Think of containers as the drones of modern computing. They're small, agile, and can carry a significant payload. In many ways, they represent the fruition of the last two major paradigm shifts in enterprise software: SOA and virtualization. However, for companies to fully leverage this innovative approach, a persistent storage platform is needed that is as flexible and scalable as containers themselves.
Register for this episode of The Briefing Room to hear Bloor Group Data Scientist Dez Blanchfield, who will explain the significance of container technology, and the relevance of software-defined storage (SDS) in a constantly evolving IT world. He'll be briefed by Steve Watt and Sayan Saha of Red Hat, who will demonstrate how open-source technology can help organizations take advantage of this brave new world of enterprise computing. They will explain how containers are the next step in the evolution of the operating system, and why SDS is now the optimal solution.
How to reinvent your organization in an iterative and pragmatic way? This is the result of using our digital toolbox. It allows you to transform your business model, expand your ecosystem by setting up your digital platform. This reinvention is also supported by the adaptation of your governance allowing you to innovate while guaranteeing the performance of your organization. For any information / suggestion / collaboration - william.poos@nrb.be
Comment réinventer votre organisation de manière itérative et pragmatique ? C'est le résultat de l'utilisation de notre boîte à outils digitale. Elle vous permet de transformer votre modèle métier, d'étendre votre écosystème en mettant en place votre plateforme digitale. Cette réinvention est également supportée par l'adaptation de votre gouvernance vous permettant d'innover tout en garantissant la performance de votre organisation. Pour toute information / suggestion / collaboration - william.poos@nrb.be
Come può .NET contribuire alla Data Science? Cosa è .NET Interactive? Cosa c'entrano i notebook? E Apache Spark? E il pythonismo? E Azure? Vediamo in questa sessione di mettere in ordine le idee.
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsArcadia Data
Learn how HPE uses visual analytics within a data lake to create an “Industrial Internet of Things” model that solves their data analytics problem at scale.
Large Language Models, Data & APIs - Integrating Generative AI Power into you...NETUserGroupBern
.NET User Group Meetup with Christian Weyer about Large Language Models, Data & APIs - Integrating Generative AI Power into your solutions - with Python and .NET
How do you analyze a Petabyte of data?
The Spark Python API or PySpark exposes the Spark programming model to Python. Apache® Spark™ is open-source and is one of the most popular Big Data frameworks for scaling up your tasks in a cluster. It was developed to utilize distributed, in-memory data structures to improve data processing speeds for massive amounts of data.
We’ll also look into Spark SQL - Apache Spark’s module for working with structured data and MLlib - Apache Spark’s scalable machine learning library.
What will you learn?
Perform Big Data analysis with PySpark
Use SQL queries with DataFrames by using the Spark SQL module
Use Machine learning with MLlib library
If you're like most of the world, you're on an aggressive race to implement machine learning applications and on a path to get to deep learning. If you can give better service at a lower cost, you will be the winners in 2030. But infrastructure is a key challenge to getting there. What does the technology infrastructure look like over the next decade as you move from Petabytes to Exabytes? How are you budgeting for more colossal data growth over the next decade? How do your data scientists share data today and will it scale for 5-10 years? Do you have the appropriate security, governance, back-up and archiving processes in place? This session will address these issues and discuss strategies for customers as they ramp up their AI journey with a long term view.
Content Strategy and Developer Engagement for DevPortalsAxway
Slides from Write the Docs Ottawa Meet Up at Shopify HQ in Canada, June 24, 2019
We’ll walk through 5 scenarios and concrete ways of reaching a developer community for frictionless and increased engagement.
Single Source of Truth for Network AutomationAndy Davidson
The importance of building a single source of truth for information within your organisation, when you embark upon a network automation project. Simply automating router configuration steps is not "network automation".
MySQL day Dublin - OCI & Application DevelopmentHenry J. Kröger
Slide deck from the MySQL day on the 23rd of October 2018 in the Oracle Dublin office. Presents Oracle's Cloud Infrastructure and Application Development Platform using Docker and Kubernetes.
Similar to Introduction to the source{d} Stack (20)
In early September, Apple released a paper describing Overton, the framework they built to create, monitor, and improve production-based ML systems. After presenting the main lines that define this framework, we will take a closer look at the heart of Overton: slice-based learning.
What's new in the latest source{d} releases!source{d}
We recently announce source{d} 0.11, 0.12 and 0.13, two releases with lots of new features and performance improvements. From windows support, to port management, C# language support and new SQL querying, there is a lot for you to get excited about. We also discussed why you should care about Engineering Observability and what are some of the top use cases for source{d} in enterprises.
Code as Data workshop: Using source{d} Engine to extract insights from git re...source{d}
This workshop will teach you the basics git concepts (such as references, commits, and blobs) and how they can be mapped into a series of relational tables.
Once we understand the basic concepts we will show how language classification and program parsing are available as SQL custom functions, how to use them correctly, and how to obtain aggregate results with `GROUP BY` and friends. We will discuss Universal Abstract Syntax Trees and how some advanced checks can be done on top this language agnostic structure. Running these checks at scale requires some extra knowledge and we’ll discuss the challenges and their possible solutions.
To finish, we will also discuss how the information in git repositories encodes a form of social network which can be used to better understand the engineering processes of a given organization.
Gitbase, SQL interface to Git repositoriessource{d}
At source{d} we analyze a huge amount of git repositories and extract insights on source code. To do this we have created a powerful engine for language-agnostic analysis of your source code and git history. The git history part of the analysis is handled by Gitbase, an SQL database engine that is able to understand git repositories and is MySQL protocol compatible. You'll learn about our journey, from what began as a side project to the current state, its internals and the different solutions we approached.
In this talk I will discuss how to deduplicate large amounts of source code using the source{d} stack, and more specifically the Apollo project. The 3 steps of the process used in Apollo will be detailed, ie: - the feature extraction step; - the hashing step; - the connected component and community detection step; I'll then go on describing some of the results found from applying Apollo to Public Git Archive, as well as the issues I faced and how these issues could have been somewhat avoided. The talk will be concluded by discussing Gemini, the production-ready sibling project to Apollo, and imagining applications that could extract value from Apollo.
After a quick introduction on the motivation behind Apollo, as said in the abstract I'll describe each step of Apollo's process. As a rule of thumb I'll first describe it formally, then go into how we did it in practice.
Feature extraction: I'll describe code representation, specifically as UASTs, then from there detail the features used. This will allow me to differentiate Apollo from it's inspiration, DejaVu, and talk about code clones taxonomy a bit. TF-IDF will also be touched upon. Hashing: I'll describe the basic Minhashing algorithm, then the improvements Sergey Ioffe's variant brought. I'll justify it's use in our case simultaneously. Connected components/Community detection: I'll describe the connected components and community notion's first (as in in graphs), then talk about the different ways we can extract them from the similarity graph.
After this I'll talk about the issues I had applying Apollo to PGA due to the amount of data, and how I went around the major issued faced. Then I'll go on talking about the results, show some of the communities, and explain in light of these results how issues could have been avoided, and the whole process improved. Finally I'll talk about Gemini, and outline some of the applications that could be imagined to Source code Deduplication.
Assisted code review with source{d} lookoutsource{d}
Ensuring that a codebase is consistent in style is both hard and costly, yet it is extremely important for maintainability and to reduce technical debt. This problem is one of the many pain points we are currently tackling with source{d} Lookout, our brand new assisted code review framework.
The purpose of source{d} Lookout is to bring assisted code review to anyone in an easy-to-setup, easy-to-use, easy-to-extend fashion. To achieve that, source{d} Lookout watches Github repos and triggers a set of analyzers when new code is sent for review or pushed.
source{d} is building the open-source components to enable large-scale code analysis and machine learning on source code. Their powerful tools can ingest all of the world’s public git repositories turning code into ASTs ready for machine learning and other analyses, all exposed through a flexible and friendly API. Francesc will show you how to run machine learning on source code with a series of live demos.
Inextricably linked reproducibility and productivity in data science and ai ...source{d}
In this talk Mark Coleman presents his team's research comparing the evolution of Software Development & DevOps with that of Data Science & AI.
Because it is more complex and has far more moving parts, Data Science & AI is where Software Development was in 1999: people are emailing and Slacking notebooks to each other, due to a lack of appropriate tooling. There are few CI/CD pipelines and model health monitoring is scarce. A lot that could be automated is still manual. And teams are siloed. This causes problems both for productivity: it's hard to collaborate, and reproducibility: which impacts on governance and compliance.
source{d} Engine: Exploring git repos with SQLsource{d}
Francesc Campoy from source{d} will talk about gitbase, a new Open Source project fully written in Go that stands on the shoulders of giants, as one says. By integrating the codebases of go-git – the most successful git implementation in Go – and vitess – a replication layer for all the MySQL databases at YouTube, gitbase is able to provide an easy way to extract information from hundreds of git repositories with a simple SQL request.
The talk will provide an in-depth description of the project as well as the way source{d} implemented it and what they learned on the way.
Introduction to source{d} Engine and source{d} Lookout source{d}
Join us for a presentation and demo of source{d} Engine and source{d} Lookout. Combining code retrieval, language agnostic parsing, and git management tools with familiar APIs parsing, source{d} Engine simplifies code analysis. source{d} Lookout, a service for assisted code review that enables running custom code analyzers on GitHub pull requests.
We've all wondered how to use Machine Learning with Go, but what about turning the tables for once? What can Machine Learning do *for* Go? During this presentation, we will discover how different Machine Learning models can help us write better go by predicting from our next character to our next bug!
Francesc’s talk will cover the basics of what Machine Learning techniques can be applied to source code, specifically:
- [embeddings over identifiers] (https://bit.ly/2HEcQhg)
- structural embeddings over source code, answering the question of how similar two fragments of code are,
- recurrent neural networks for code completion,
- future direction of the research.
While the topic is advanced, the level of mathematics required for this talk will be kept to a minimum. Rather than getting stuck in the details, we'll discuss the advantages and limitations of these techniques, and their possible implications to our developer lives.
go-git is a 100% Go libray used to interact with git repositories. Even if it already supports most of the functionality it still lags a bit in performance when compared with the git CLI or some other libraries. I'll explain some of the problems that we face when dealing with git repos and some examples of performance improvements done to the library.
Machine Learning on Source Code (MLonCode) is an emerging and exciting domain of research which stands at the sweet spot between deep learning, natural language processing, social science and programming. We’ve accumulated petabytes of source code data that is open, yet there have been few attempts to fully leverage the knowledge that is sealed inside. This talk gives an introduction into the current trends in MLonCode and presents the tools and some of the applications, such as deep code suggestions and structural embeddings for fuzzy deduplication.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
2. Vision: to empower through code
2
Organizations accumulate a massive amount of source code over time: an
actionable yet overlooked dataset that is ripe for advanced static analysis with
Machine Learning.
source{d} allows you to take developer tooling & engineering insights to the
next level, modernizing how we evaluate software development and write &
review code.
5. From bricks to code
5
Most valuable public companies
by market cap, 2006–2018
Today, the most valuable companies
come from tech while brick-and-mortar
businesses were quickly left behind.
What is behind this shift?
6. For decades already, value has been
shifting from tangible to intangible
assets, such as intellectual property
from computer software.
Why?
Because "Every business will become
a software business."
–Satya Nadella, Microsoft CEO
The value shift
6
S&P500 companies market value
composition, 1975–2015
7. Despite its rising importance, source
code is still an underutilized asset.
The inherent characteristics of source
code – volume, variety, intricacy,
versioning – make it a powerful asset
holding a wealth of implicit knowledge.
The challenge–opportunity window
7
Why haven't companies leveraged
source code as a competitive
advantage?
8. Source code analysis is a very hard
problem to tackle, even if you are a
dominant tech company.
No easy path towards the inevitable
digital transformation & business
intelligence on code. The equipped
with the adequate tools will lead.
The challenge–opportunity window
It is hard to retrieve and store source code across
scattered repositories in scalable ways.
Maintenance and complexity leave no budget &
resources for codebase modernization.
Tooling has not improved at the pace of innovation.
Engineering executives have poor visibility into
their organization's actual source code.
Companies are not getting any business insights
from their codebase.
8
9. The solution
9
"The future is already here — it's
just not very evenly distributed."
–William Gibson, sci-fi writer
10. Quality
Enhance the quality of your
codebase with better data
and insights, detecting
potential defects and
vulnerabilities while having
a deeper understanding of
your architecture.
Agility
Faster retrieval and
analysis of your source
code improving the
efficiency of your
engineering organization
so you can ship on time.
Intelligence
All your codebases in a
single place coupled with
powerful machine learning
based analysis tools to
gain actionable insights for
your teams and business.
Become faster, better, smarter
10
11. How it works? We made it simple(r)
11
MACHINE LEARNING ON CODECODE AS DATA
12. Sophisticated machine learning tools,
algorithms and applications on code.
VALUE PROPOSITION
Code as Data feeds machine learning
algorithms which power cutting-edge
applications, such as source{d} Lookout
for assisted code review.
Code as Data + ML on Code
Accessible, language-agnostic,
large-scale source code analysis.
VALUE PROPOSITION
Code becomes a first-class analyzable
asset, be it across a thousands or tens
of millions of repositories, through our
powerful source{d} Engine.
12
MACHINE LEARNING ON CODECODE AS DATA
13. How it stacks together
13
CodeasData
source{d} applications
source{d} ML
source{d} Engine
SQL interface & distributed computing using Apache Spark to
generate large datasets of Universal ASTs, analyzed directly or as
input to ML models
Tools & libraries to train ML models in public or private codebases as
well as models pre-trained on big code
Retrieve and store source code from all of the world's public code,
or all code on premise that is stored in version control systems
MLonCode
Next-generation developer tooling taking advantage of machine
learning & big code, as source{d} Lookout for assisted code review
14. In summary, the one-stop-shop for your source code analysis needs:
I. Retrieve & store your company code history—or the world's—as a dataset.
II. Parse code as language-agnostic syntax trees & semantic concepts.
III. Query your code base using SQL & analyze it via a Spark API.
IV. Analyze features from code, train and apply machine learning models.
V. Empower developers & managers with code-driven tooling & intelligence.
All designed to perform in a flexible, distributed and scalable manner.
Powered by source{d}
14
16. Familiar APIs
Analyze your code
through powerful friendly
APIs, such as SQL, gRPC,
REST, and various client
libraries. Use tools you're
familiar with to create
reports and dashboards.
History Analysis
Extract information from
the evolution, commits,
and metadata of your
codebase and generate
detailed reports and
insights.
Code Retrieval
Retrieve and store the
code history of your
organization (including
your open-source
repositories) as a dataset.
source{d} Engine
16
Analysis in/for
any Language
Automatically identify
languages, parse source
code, and extract the
pieces that matter in a
distributed and
language-agnostic way.
17. Detecting languages
& parsing code
Key challenges to use Code as Data
17
Distributed retrieval
and storage
Language-agnostic
Universal ASTs
SQL queries for
repositories & UASTs
Running analysis
pipelines at scale
Apache Spark for
code analysis
Retrieving and
storing code at scale
SOLUTIONS
Turning code and
history into insights
CHALLENGES
18. Retrieving and storing code at scale
? Code bases can be extremely large.
? Code repositories are often scattered over teams & servers.
? Version Control Systems history knowledge requires special tooling.
? Code repositories often contain duplicates from the same root.
? Code repositories are frequently updated.
? Implementation & maintenance requires massive data engineering effort.
18
19. Distributed retrieval and storage
19
Discovery
Fetcher
workerworker
Workers
Storage Layer
Public Code
Code
Repositories
Retrieval Architecture✔ Distributed & scalable
✔ Repository discovery
✔ Keeps VCS history
✔ Efficient storage
✔ Efficient updating
✔ Saves large data engineering effort
20. Detecting languages & parsing code
? Identifying programming languages accurately & quickly.
? Analyzing code as plain text is limiting—compilers/interpreters use ASTs.
? Language-specific parsers are needed to tap the power of ASTs.
? Resulting ASTs differ wildly over languages, AST standardization is needed.
? Semantic concepts (e.g. functions) have no standard across languages.
? Speed and scalability.
20
21. Language-agnostic Universal ASTs
21
✔ Perform language-agnostic analysis
✔ Use standardized semantic concepts
✔ Do complex analyses powered by
Universal ASTs & XPath filters
✔ Add new language support easily
✔ Performance at scale
Source code→ →Universal ASTs
22. Turning code and history into insights
22
? Querying large amounts of code at different levels for insights.
? Typical tooling works only on the current version of your code.
? Access to code bases requires distributed, large scale data engineering.
? New data sources require teams learning new languages and tools.
? Reports, dashboards, updates require custom work from developers.
23. SQL queries for repositories & UASTs
23
✔ Save on large data engineering effort
✔ Answer from code history via SQL
✔ Query repositories, files, UASTs over
history
✔ Use MySQL tools your team knows
✔ Any team member can query for
reports, dashboards, charts
Storage Layer
Git interface
Query engine
MySQL Server
Distributed Layer
Architecture
DataFlow
24. Running analysis pipelines at scale
24
? Analyzing large amounts of code data and history for insights.
? Building performant custom data pipelines for ML on code over thousands or
millions of code repositories require distributed computing.
? Complexity of pipeline and integration technical components in such projects make
deployment in production demanding.
? Different systems/requirements for research/prototyping and production
environments, prompting extra development work.
25. Apache Spark for code analysis
25
✔ MapReduce for source code
✔ Friendly API extends Apache Spark™
✔ Integrate tech stack pieces seamlessly
✔ Build data pipelines over code,
VCS history, UASTs
✔ Run locally or on large scale distributed
clusters using containers
DataFlow
Apache Spark API
engine worker
engine worker
Engine instances
gitbase instance
gitbase instance
Query instances
gitbase instance
gitbase instance
Storage Layer
Code parsing
instances
Architecture
26. Analyses via CLI, GUI, APIs
Graphical Web Client Command
Line
Interface
Or your own
tools via APIs,
client libraries
26
27. Have questions? Ask your codebase
"What are our top 10 projects with the
most developers working on them?"
27
"How many new projects do we start per
year in our organization?"
28. Have questions? Ask your codebase
"How many repositories our codebase
has per programming language?"
28
“Are our security keys for certificates or
applications exposed in our code?”
29. Have questions? Ask your codebase
"How many repositories our codebase
has per programming language?"
Text 2
29
30. Have questions? UASTs have answers
"How many repositories our codebase
has per programming language?"
30
def check(uast):
findings = []
sql_commands = set({"SELECT", "UPDATE", "DELETE", "INSERT",
"CREATE", "ALTER", "DROP"})
infixes = bblfsh.filter(uast, "//InfixExpression[@roleAdd and @roleBinary and @roleOperator]")
for i in infixes:
strs = bblfsh.filter(i, "//String[@internalRole='leftOperand']")
for s in strs:
first_word = s.properties["Value"].split()[0]
if first_word in sql_commands:
findings.append({"msg": "Potential SQL injection vulnerability",
"pos": s.start_position})
return findings
"Is our code vulnerable to SQL injection
types of attacks?"
31. Code As Data Roadmap
31
Distributed SQL for source code
Implement a distributed layer over
MySQL that allows users to query all
the history of repos, code & UASTs
Cross-Reference Resolution Provided
Full integration of cross-references to
enable more powerful dependency
aware static analysis
2015
First data pipeline for the
world’s open source code
Index 10+ million git repositories
and process git & source code by
extending Apache Spark
Universal ASTs are created
We now have the ability to analyze
source code agnostic to languages as
Universal Abstract Syntax Trees
2016
Creation of go-git
Starting to build what would
become one of 3 reference
implementations of Git
2017 2019
2018
32. source{d} Engine - Enterprise Edition
32
Multi-node / Hybrid Cluster (Public Clouds and On Prem) analysis for large scale
distributed codebases
Distributed Analysis
Security & Governance
Support & Certification
Controlled Code Deployment, Forensic code history, RBAC, 3rd party integrations,
Audit Logs, Deployment Options, etc
Enterprise grade support and SLA, Certified plugins & infrastructure
SOURCE[D} ENGINE FOR THE ENTERPRISE
34. Areas of interest for ML on Code
34
Malicious Actor Detection, Vulnerability Detection, Malicious Code Detection
Style Conventions, Idiomatic Code, Naming Suggestions, Architecture Suggestions
Security & Compliance
Bug Detection & Prediction
Performance
Test Suggestions, Test Generation, Predicting & detecting bugs by learning from code
history and issues, Memory, CPU & battery optimizations
Predicting & detecting bugs by learning from code history and issues
Memory, CPU & battery optimizations
Assisted Code Review
Language-agnostic duplicate & similar code detection from project to function level
Assisted QA & Testing
Duplicate Code Detection
35. ML on Code applications
35
assisted code review
naming suggestions
style transfer on code
vulnerability detection
defect detection
code suggestions
automatic bug fixing
inductive programming
automatic refactoring
AI-based pair programming
natural language to code
math to code
neural compilers
transpilation
the
future
near
term
36. Public Git Archive dataset
Published the largest dataset of code
to date at 3TB and 180k OSS projects
Gemini for Duplicate Code Detection
Duplicate & similarity code detection
at scale up to function level
ML on Code Roadmap
36
Assisted Code Review
Automate parts of the code review
process with multiple, such as code style
analyzers
First ML on Code models
First models trained on code:
topic modeling of 18M
repositories and clustering of
duplicate code over 10M repos
sourced.ml is created
Fundamental for ML on Code R&D:
feature extraction, model training
ML on Code research repository
The largest reference of machine
learning on code with 3k+ readers
2016
2017 2019
2018
38. ML on Code-powered applications will
revolutionize software development.
source{d} ML is the to-go place for ML
on Code tools, models & research.
Empower devs & managers with apps as
source{d} Lookout for code review &
source{d} Gemini to detect similar code.
Code as Data is inevitable; those
equipped to benefit will be ahead.
source{d} Engine is the one-stop-shop
for large-scale code analysis needs.
Distill the knowledge from your code
and benefit from more agility, quality
and intelligence over your SDLC.
Takeaways
38
MACHINE LEARNING ON CODECODE AS DATA
39. Cross-Reference Resolution Provided
Assisted Code Review
Universal ASTs are created
sourced.ml is created
ML on Code research repository
Roadmap
39
Distributed SQL for Git
Public Git Archive dataset
Gemini for Duplicate Code Detection
2015
First data pipeline for the
world’s open source code
First ML on Code models
2016
Creation of go-git
2017 2019
2018
41. An international Open Core company
41
30+ employees worldwide; remote first with offices in San Francisco, Madrid
and soon Seattle.
Three pillars: For developers; By developers; Opinionatedly Free Spoken.
Practitioners of Open Source, Open Science and Open Company philosophies.
Experienced founders, senior team members and domain-expert advisors.
$10 million funding from Xavier Niel, Otium, Sunstone Capital and others.
42. Team: key people
42
Francesc Campoy
VP of Developer
Community
‒ Key Golang Developer
Advocate at Google for
the last 5 years
‒ International software
engineer with an
extensive experience in
C++ developing at Google
and Amadeus
Vadim Markovtsev
Lead Machine Learning
Engineer
‒ Creator of Samsung’s Veles:
distributed machine learning
platform (responsible for
>90% of the code)
‒ Lead Mail.ru anti-spam efforts
‒ Former associate professor at
the Moscow Institute of
Physics and Technology
Victor Coisne
Head of Growth and
Community
‒ 5+ years as Head of
Community at Docker
‒ Open Source contributor to
many developer advocacy,
community education &
engagement programs
‒ Experienced partner
enablement & relations
manager (Microsoft, IBM,
Digital Ocean, etc)
Eiso Kant
Co Founder & CEO
‒ Programming since the age of
12, these days in Haskell & Go
‒ Co-founder of Tyba
(2011-2015)
‒ Founder of Twollars
(2008-2009)
‒ Used to create software that
automatically generated
websites
Jorge Schnura
Co Founder & COO
‒ Responsible for operations,
“never drops the ball”
‒ Co-founder at Tyba
(2011-2015)
‒ Previously in finance
‒ Loves to automate boring
things in Python
Máximo Cuadros
CTO
‒ 5+ years as CTO, 15+ years
experience
‒ Self-taught, programming for
more than 20 years (polyglot)
‒ Open source contributor to
many projects (CoreOS,
Terraform)
‒ Active member of the Golang
community