How does using Hadoop in the cloud for data analytics fit into the context of continuous deployment? Also taking a look at how one can use CAP to fit data access patterns with appropriate data frameworks.
How future astronomy projects will generate enormous amounts of data, and what does that mean for astronomical data processing. Part of the virtual observatory course by Juan de Dios Santander Vela, as imparted for the MTAF (Métodos y Técnicas Avanzadas en Física, Advanced Methods and Techniques in Physics) Master at the University of Granada (UGR).
How future astronomy projects will generate enormous amounts of data, and what does that mean for astronomical data processing. Part of the virtual observatory course by Juan de Dios Santander Vela, as imparted for the MTAF (Métodos y Técnicas Avanzadas en Física, Advanced Methods and Techniques in Physics) Master at the University of Granada (UGR).
The database world is undergoing a major upheaval. NoSQL databases such as MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offering significantly better scalability and performance. But these databases have a very different and unfamiliar data model and APIs as well as a limited transaction model. Moreover, the relational world is fighting back with so-called NewSQL databases such as VoltDB, which by using a radically different architecture offers high scalability and performance as well as the familiar relational model and ACID transactions. Sounds great but unlike the traditional relational database you can't use JDBC and must partition your data.
In this presentation you will learn about popular NoSQL databases - MongoDB, and Cassandra - as well at VoltDB. We will compare and contrast each database's data model and Java API using NoSQL and NewSQL versions of a use case from the book POJOs in Action. We will learn about the benefits and drawbacks of using NoSQL and NewSQL databases.
Presentation: Overview of Kognitio, Kognitio Cloud and the Kognitio Analytical Platform
Kognitio is driving the convergence of Big Data, in-memory analytics and cloud computing. Having delivered the first in-memory analytical platform in 1989, it was designed from the ground up to provide the highest amount of scalable compute power to allow rapid execution of complex analytical queries without the administrative overhead of manipulating data. Kognitio software runs on industry-standard x86 servers, or as an appliance, or in Kognitio Cloud, a ready-to-use analytical platform. Kognitio Cloud is a secure, private or public cloud Platform-as-a-Service (PaaS), leveraging the cloud computing model to make the Kognitio Analytical Platform available on a subscription basis. Clients span industries, including market research, consumer packaged goods, retail, telecommunications, financial services, insurance, gaming, media and utilities.
To learn more, visit www.kognitio.com and follow us on Facebook, LinkedIn and Twitter.
Next Generation Data Platforms - Deon ThomasThoughtworks
A new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery and/or analysis.
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
The unprecedented scale at which data is consumed and generated today has shown a large demand for scalable data management and given rise to non-relational, distributed "NoSQL" database systems. Two central problems triggered this process: 1) vast amounts of user-generated content in modern applications and the resulting requests loads and data volumes 2) the desire of the developer community to employ problem-specific data models for storage and querying. To address these needs, various data stores have been developed by both industry and research, arguing that the era of one-size-fits-all database systems is over. The heterogeneity and sheer amount of these systems - now commonly referred to as NoSQL data stores - make it increasingly difficult to select the most appropriate system for a given application. Therefore, these systems are frequently combined in polyglot persistence architectures to leverage each system in its respective sweet spot. This tutorial gives an in-depth survey of the most relevant NoSQL databases to provide comparative classification and highlight open challenges. To this end, we analyze the approach of each system to derive its scalability, availability, consistency, data modeling and querying characteristics. We present how each system's design is governed by a central set of trade-offs over irreconcilable system properties. We then cover recent research results in distributed data management to illustrate that some shortcomings of NoSQL systems could already be solved in practice, whereas other NoSQL data management problems pose interesting and unsolved research challenges.
If you'd like to use these slides for e.g. teaching, contact us at gessert at informatik.uni-hamburg.de - we'll send you the PowerPoint.
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
YapMap is a new kind of search platform that does multi-quanta search to better understand threaded discussions. This talk will cover how HBase made it possible for two self-funded guys to build a new kind of search platform. We will discuss our data model and how we use row based atomicity to manage parallel data integration problems. We’ll also talk about where we don’t use HBase and instead use a traditional SQL based infrastructure. We’ll cover the benefits of using MapReduce and HBase for index generation. Then we’ll cover our migration of some tasks from a message based queue to the Coprocessor framework as well as our future Coprocessor use cases. Finally, we’ll talk briefly about our operational experience with HBase, our hardware choices and challenges we’ve had.
This presentation contains following slides,
Introduction To OLAP
Data Warehousing Architecture
The OLAP Cube
OLTP Vs. OLAP
Types Of OLAP
ROLAP V/s MOLAP
Benefits Of OLAP
Introduction - Apache Kylin
Kylin - Architecture
Kylin - Advantages and Limitations
Introduction - Druid
Druid - Architecture
Druid vs Apache Kylin
References
For any queries
Contact Us:- argonauts007@gmail.com
Presented at workshop by Apalia on buiiding a Private Cloud : our feedback describing the advantages of a private cloud when used to operate a legacy Cobol application migrated from a mainframe to Java & Linux.
Accelerating Big Data Analytics with Apache KylinTyler Wishnoff
Learn about the latest advancements in Apache Kylin and how its OLAP technology is making analytics faster and insights more actionable.
Learn more about Apache Kylin: https://kyligence.io/apache-kylin-overview/
Learn more about Apache Kylin's enterprise version Kyligence: https://kyligence.io/
Human in the loop: a design pattern for managing teams working with MLPaco Nathan
Strata CA 2018-03-08
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/64223
Although it has long been used for has been used for use cases like simulation, training, and UX mockups, human-in-the-loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. One approach, active learning (a special case of semi-supervised learning), employs mostly automated processes based on machine learning models, but exceptions are referred to human experts, whose decisions help improve new iterations of the models.
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
Strata Singapore 2017 session talk 2017-12-06
https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/65611
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called active learning allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We’ll consider some of the technical aspects — including available open source projects — as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn’t it applicable?
* How do HITL approaches compare/contrast with more “typical” use of Big Data?
* What’s the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time:
* In what ways do the humans involved learn from the machines?
* In particular, we’ll examine use cases at O’Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
The database world is undergoing a major upheaval. NoSQL databases such as MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offering significantly better scalability and performance. But these databases have a very different and unfamiliar data model and APIs as well as a limited transaction model. Moreover, the relational world is fighting back with so-called NewSQL databases such as VoltDB, which by using a radically different architecture offers high scalability and performance as well as the familiar relational model and ACID transactions. Sounds great but unlike the traditional relational database you can't use JDBC and must partition your data.
In this presentation you will learn about popular NoSQL databases - MongoDB, and Cassandra - as well at VoltDB. We will compare and contrast each database's data model and Java API using NoSQL and NewSQL versions of a use case from the book POJOs in Action. We will learn about the benefits and drawbacks of using NoSQL and NewSQL databases.
Presentation: Overview of Kognitio, Kognitio Cloud and the Kognitio Analytical Platform
Kognitio is driving the convergence of Big Data, in-memory analytics and cloud computing. Having delivered the first in-memory analytical platform in 1989, it was designed from the ground up to provide the highest amount of scalable compute power to allow rapid execution of complex analytical queries without the administrative overhead of manipulating data. Kognitio software runs on industry-standard x86 servers, or as an appliance, or in Kognitio Cloud, a ready-to-use analytical platform. Kognitio Cloud is a secure, private or public cloud Platform-as-a-Service (PaaS), leveraging the cloud computing model to make the Kognitio Analytical Platform available on a subscription basis. Clients span industries, including market research, consumer packaged goods, retail, telecommunications, financial services, insurance, gaming, media and utilities.
To learn more, visit www.kognitio.com and follow us on Facebook, LinkedIn and Twitter.
Next Generation Data Platforms - Deon ThomasThoughtworks
A new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery and/or analysis.
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
The unprecedented scale at which data is consumed and generated today has shown a large demand for scalable data management and given rise to non-relational, distributed "NoSQL" database systems. Two central problems triggered this process: 1) vast amounts of user-generated content in modern applications and the resulting requests loads and data volumes 2) the desire of the developer community to employ problem-specific data models for storage and querying. To address these needs, various data stores have been developed by both industry and research, arguing that the era of one-size-fits-all database systems is over. The heterogeneity and sheer amount of these systems - now commonly referred to as NoSQL data stores - make it increasingly difficult to select the most appropriate system for a given application. Therefore, these systems are frequently combined in polyglot persistence architectures to leverage each system in its respective sweet spot. This tutorial gives an in-depth survey of the most relevant NoSQL databases to provide comparative classification and highlight open challenges. To this end, we analyze the approach of each system to derive its scalability, availability, consistency, data modeling and querying characteristics. We present how each system's design is governed by a central set of trade-offs over irreconcilable system properties. We then cover recent research results in distributed data management to illustrate that some shortcomings of NoSQL systems could already be solved in practice, whereas other NoSQL data management problems pose interesting and unsolved research challenges.
If you'd like to use these slides for e.g. teaching, contact us at gessert at informatik.uni-hamburg.de - we'll send you the PowerPoint.
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
YapMap is a new kind of search platform that does multi-quanta search to better understand threaded discussions. This talk will cover how HBase made it possible for two self-funded guys to build a new kind of search platform. We will discuss our data model and how we use row based atomicity to manage parallel data integration problems. We’ll also talk about where we don’t use HBase and instead use a traditional SQL based infrastructure. We’ll cover the benefits of using MapReduce and HBase for index generation. Then we’ll cover our migration of some tasks from a message based queue to the Coprocessor framework as well as our future Coprocessor use cases. Finally, we’ll talk briefly about our operational experience with HBase, our hardware choices and challenges we’ve had.
This presentation contains following slides,
Introduction To OLAP
Data Warehousing Architecture
The OLAP Cube
OLTP Vs. OLAP
Types Of OLAP
ROLAP V/s MOLAP
Benefits Of OLAP
Introduction - Apache Kylin
Kylin - Architecture
Kylin - Advantages and Limitations
Introduction - Druid
Druid - Architecture
Druid vs Apache Kylin
References
For any queries
Contact Us:- argonauts007@gmail.com
Presented at workshop by Apalia on buiiding a Private Cloud : our feedback describing the advantages of a private cloud when used to operate a legacy Cobol application migrated from a mainframe to Java & Linux.
Accelerating Big Data Analytics with Apache KylinTyler Wishnoff
Learn about the latest advancements in Apache Kylin and how its OLAP technology is making analytics faster and insights more actionable.
Learn more about Apache Kylin: https://kyligence.io/apache-kylin-overview/
Learn more about Apache Kylin's enterprise version Kyligence: https://kyligence.io/
Human in the loop: a design pattern for managing teams working with MLPaco Nathan
Strata CA 2018-03-08
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/64223
Although it has long been used for has been used for use cases like simulation, training, and UX mockups, human-in-the-loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. One approach, active learning (a special case of semi-supervised learning), employs mostly automated processes based on machine learning models, but exceptions are referred to human experts, whose decisions help improve new iterations of the models.
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
Strata Singapore 2017 session talk 2017-12-06
https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/65611
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called active learning allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We’ll consider some of the technical aspects — including available open source projects — as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn’t it applicable?
* How do HITL approaches compare/contrast with more “typical” use of Big Data?
* What’s the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time:
* In what ways do the humans involved learn from the machines?
* In particular, we’ll examine use cases at O’Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
Human-in-a-loop: a design pattern for managing teams which leverage ML
Big Data Spain, 2017-11-16
https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We'll consider some of the technical aspects -- including available open source projects -- as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn't it applicable?
* How do HITL approaches compare/contrast with more "typical" use of Big Data?
* What's the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time
* In what ways do the humans involved learn from the machines?
In particular, we'll examine use cases at O'Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
JupyterCon NY 2017-08-24
https://www.safaribooksonline.com/library/view/jupytercon-2017-/9781491985311/video313210.html
Paco Nathan reviews use cases where Jupyter provides a front-end to AI as the means for keeping "humans in the loop". This talk introduces *active learning* and the "human-in-the-loop" design pattern for managing how people and machines collaborate in AI workflows, including several case studies.
The talk also explores how O'Reilly Media leverages AI in Media, and in particular some of our use cases for active learning such as disambiguation in content discovery. We're using Jupyter as a way to manage active learning ML pipelines, where the machines generally run automated until they hit an edge case and refer the judgement back to human experts. In turn, the experts training the ML pipelines purely through examples, not feature engineering, model parameters, etc.
Jupyter notebooks serve as one part configuration file, one part data sample, one part structured log, one part data visualization tool. O'Reilly has released an open source project on GitHub called `nbtransom` which builds atop `nbformat` and `pandas` for our active learning use cases.
This work anticipates upcoming work on collaborative documents in JupyterLab, based on Google Drive. In other words, where the machines and people are collaborators on shared documents.
Humans in the loop: AI in open source and industryPaco Nathan
Nike Tech Talk, Portland, 2017-08-10
https://niketechtalks-aug2017.splashthat.com/
O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner.
This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon.
Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do.
In particular, we'll show two open source projects in Python from O'Reilly's AI team:
• pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics
• nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation
Lessons learned from 3 (going on 4) generations of Jupyter use cases at O'Reilly Media. In particular, about "Oriole" tutorials which combine video with Jupyter notebooks, Docker containers, backed by services managed on a cluster by Marathon, Mesos, Redis, and Nginx.
https://conferences.oreilly.com/fluent/fl-ca/public/schedule/detail/62859
https://conferences.oreilly.com/velocity/vl-ca/public/schedule/detail/62858
Strata UK 2017. Computable content leverages Jupyter notebooks to make learning materials more powerful by integrating compute engines, data sources, etc. O’Reilly Media extended this approach to create the new Oriole Online Tutorial medium, publishing notebooks from authors along with video timelines. (A free public tutorial, Regex Golf, by Peter Norvig demonstrates what’s possible with this technology integration.) Each user session launches a Docker container on a Mesos cluster for fully personalized compute environments. The UX is entirely browser based.
See 2020 update: https://derwen.ai/s/h88s
SF Python Meetup, 2017-02-08
https://www.meetup.com/sfpython/events/237153246/
PyTextRank is a pure Python open source implementation of *TextRank*, based on the [Mihalcea 2004 paper](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) -- a graph algorithm which produces ranked keyphrases from texts. Keyphrases generally more useful than simple keyword extraction. PyTextRank integrates use of `TextBlob` and `SpaCy` for NLP analysis of texts, including full parse, named entity extraction, etc. It also produces auto-summarization of texts, making use of an approximation algorithm, `MinHash`, for better performance at scale. Overall, the package is intended to complement machine learning approaches -- specifically deep learning used for custom search and recommendations -- by developing better feature vectors from raw texts. This package is in production use at O'Reilly Media for text analytics.
Use of standards and related issues in predictive analyticsPaco Nathan
My presentation at KDD 2016 in SF, in the "Special Session on Standards in Predictive Analytics In the Era of Big and Fast Data" morning track about PMML and PFA http://dmg.org/kdd2016.html
Presented 2015-08-24 at SF Bay ACM, held at the eBay south campus in San Jose.
http://meetup.com/SF-Bay-ACM/events/221693508/
Project Jupiter https://jupyter.org/ evolved from IPython notebooks, and now supports a wide variety of programming language back-ends. Notebooks have proven to be effective tools used in Data Science, providing convenient packages for what Don Knuth coined as "literate programming" in the 1980s: code plus exposition in markdown. Results of running the code appear in-line as interactive graphics -- all packaged as collaborative, web-based documents. Some have said that the introduction of cloud-based notebooks is nearly as large of a fundamental change in software practice as the introduction of spreadsheets.
O'Reilly Media has been considering the question, "What comes after books and video?" Or, as one might imagine more pointedly, what comes after Kindle? To that point we have collaborated with Project Jupyter to integrate notebooks into our content management process, allowing authors to generate articles, tutorials, reports, and other media products as notebooks that also incorporate video segments. Code dependencies are containerized using Docker, and all of the content gets managed in Git repositories. We have added another layer, an open source project called Thebe that provides a kind of "media player" for embedding the containerized notebooks into web pages
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
http://www.meetup.com/Seattle-Data-Science/events/223445403/
Almost a dozen almost-truisms about Data that almost everyone should consider carefully as they embark on a journey into Data Science. There are a number of preconceptions about working with data at scale where the realities beg to differ. This talk estimates that number to be at least eleven, through probably much larger. At least that number has a great line from a movie. Let's consider some of the less-intuitive directions in which this field is heading, along with likely consequences and corollaries -- especially for those who are just now beginning to study about the technologies, the processes, and the people involved.
Microservices, containers, and machine learningPaco Nathan
http://www.oscon.com/open-source-2015/public/schedule/detail/41579
In this presentation, an open source developer community considers itself algorithmically. This shows how to surface data insights from the developer email forums for just about any Apache open source project. It leverages advanced techniques for natural language processing, machine learning, graph algorithms, time series analysis, etc. As an example, we use data from the Apache Spark email list archives to help understand its community better; however, the code can be applied to many other communities.
Exsto is an open source project that demonstrates Apache Spark workflow examples for SQL-based ETL (Spark SQL), machine learning (MLlib), and graph algorithms (GraphX). It surfaces insights about developer communities from their email forums. Natural language processing services in Python (based on NLTK, TextBlob, WordNet, etc.), gets containerized and used to crawl and parse email archives. These produce JSON data sets, then we run machine learning on a Spark cluster to find out insights such as:
* What are the trending topic summaries?
* Who are the leaders in the community for various topics?
* Who discusses most frequently with whom?
This talk shows how to use cloud-based notebooks for organizing and running the analytics and visualizations. It reviews the background for how and why the graph analytics and machine learning algorithms generalize patterns within the data — based on open source implementations for two advanced approaches, Word2Vec and TextRank The talk also illustrates best practices for leveraging functional programming for big data.
https://www.eventbrite.com/e/talk-by-paco-nathan-graph-analytics-in-spark-tickets-17173189472
Big Brains meetup hosted by BloomReach, 2015-06-04
Case study / demo of a large-scale graph analytics project, leveraging GraphX in Apache Spark to surface insights about open source developer communities — based on data mining of their email forums. The project works with any Apache email archive, applying NLP and machine learning techniques to analyze message threads, then constructs a large graph. Graph analytics, based on concise Scala coding examples in Spark, surface themes and interactions within the community. Results are used as feedback for respective developer communities, such as leaderboards, etc. As an example, we will examine analysis of the Spark developer community itself.
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
"Real-Time Analytics with Spark Streaming" presented at QCon São Paulo, 2015-03-26
http://qconsp.com/presentation/real-time-analytics-spark-streaming
This talk presents an overview of Spark and its history and applications, then focuses on the Spark Streaming component used for real-time analytics. We compare it with earlier frameworks such as MillWheel and Storm, and explore industry motivations for open-source micro-batch streaming at scale.
The talk will include demos for streaming apps that include machine-learning examples. We also consider public case studies of production deployments at scale.
We’ll review the use of open-source sketch algorithms and probabilistic data structures that get leveraged in streaming – for example, the trade-off of 4% error bounds on real-time metrics for two orders of magnitude reduction in required memory footprint of a Spark app.
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
Spark and Databricks component of the O'Reilly Media webcast "2015 Data Preview: Spark, Data Visualization, YARN, and More", as a preview of the 2015 Strata + Hadoop World conference in San Jose http://www.oreilly.com/pub/e/3289
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
1. Hidden Gems found with Hadoop
Paco Nathan
Lead, Analytics team @ IMVU.com
2. Ask Questions Early…
‣ How do Hadoop and “Big Data” fit into the practice
of Continuous Deployment ?
‣ Why don’t we simply load all our data into Oracle,
then generate reports and spreadsheets as needed ?
‣ Given all the conflicting “NoSQL” options, how
does an engineer design an effective data store ?
‣ Is there one framework we can just buy and resolve
all these annoying data issues ?
‣ What kinds of analytics work can be performed
using Hadoop in the cloud ?
‣ Is IMVU currently hiring ? ☺
3. Continuous Deployment
• IMVU: ~50 engineers work in parallel, builds push live every ~8 minutes
• depends on “immune system” regression checks, progressive roll-outs
• dedication to transparency and metrics: data-intensive company culture
• extensive use of customer experiments (A/B testing) on millions of users
• instrumentation, alerting, strict discipline on config and resource usage
• Ops excellence, plus big investment in a finely tuned production environment
http://www.quora.com/What-are-best-examples-of-companies-using-continuous-deployment
http://www.slideshare.net/bgdurrett/3-reasons-you-should-use-continuous-deployment
http://www.startuplessonslearned.com/2009/06/why-continuous-deployment.html
5. Data Analytics
• data usage downstream from production cluster is a lower priority
• industry truism: data usage downstream almost never trumps
the priority of direct revenue transactions
• even so, business strategy depends on data analytics – which in
practice, at scale, must live downstream from transactions
• however, data analytics jobs tend to break that extensive work in
testing/monitoring which allows for continuous deployment:
- mission critical code which can’t be verified readily by unit tests
- “slow queries” trip immune system, signaling regressions
- likewise for large data transfers within production cluster
- tightly configured environment vs. elastic resource needs
6. How Did We Get Here?
• big Internet successes after 1997 holiday season…
AMZN, EBAY, then GOOG, Inktomi (YHOO Search)
• consider how, among tech firms, this metric:
annual revenue per customer / operational data store size
dropped more than 100x within a few years after 1997
• “conventional wisdom” of RDBMS and BI tools became
much less viable; however, business cadre which came of
age when “spreadsheets were new” tends to carry along
too much inertia to confront these issues pro-actively
• one one hand, storage and processing costs plummeted…
on the other hand, we must now work much smarter
to extract ROI from “Big Data”, so methods must adapt
• MapReduce and the Hadoop open source stack grew
directly out of this context… but they only solve part
of these problems
7. CAP Theorem
• Eric Brewer, 2000: “You can have at most two of these properties for
any shared-data system … the choice of which feature to discard
determines the nature of your system.”
• direct revenue apps in consumer Internet require consistency and
partition tolerance
• data analytics jobs for business uses generally require availability and
eventual consistency, but tend to not tolerate highly partitioned data
• ETL becomes an Achilles heal for “Lean Startup™”:
‣ agile/experiment-driven/scale-out, which leads to… strong
consistency
high
availability
‣ provably-hard-to-detect metadata drift, which leads to…
C A
‣ high-risk technical debt
https://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
http://www.julianbrowne.com/article/viewer/brewers-cap-theorem RDBMS
P eventual
partition consistency
tolerance
8. Data Access Patterns
• design patterns: originated in consensus negotiation
for architecture, then software engineering
• consider the corollaries in large scale data wrangling…
• essential advice:
select data frameworks based on your data access patterns
• in other words, decouple usage based on need –
to avoid “one size fits all” blockers
• let’s review some examples…
9. Access Patterns ↔ Frameworks
financial transactions general ledger in RDBMS CAx
ad-hoc queries RDS (hosted MySQL) CAx
reporting, dashboards like Pentaho CAx
log rotation/persistence like Riak xxP
search indexes like Lucene, Solr xAP
static content, archives S3 (durable storage) xAP
customer facts like Redis, Membase xAP
distributed counters, locks, sets like Redis x A P*
data objects CRUD key/value – like, NoSQL on MySQL CxP
authoritative metadata like Zookeeper CxP
data prep, modeling at scale like Hadoop/Hive/Cascading + R CxP
graph analysis like Hadoop + Redis + Gephi CxP
data marts like Hadoop/Hive/HBase CxP
10. Access Patterns ↔ Frameworks
financial transactions general ledger in RDBMS CAx
ad-hoc queries RDS (hosted MySQL) CAx
reporting, dashboards like Pentaho CAx
log rotation/persistence like Riak xxP
search indexes like Lucene, Solr xAP
static content, archives S3 (durable storage) xAP
customer facts like Redis, Membase xAP
distributed counters, locks, sets like Redis x A P*
data objects CRUD key/value – like, NoSQL on MySQL CxP
authoritative metadata like Zookeeper CxP
data prep, modeling at scale like Hadoop/Hive/Cascading + R CxP
graph analysis like Hadoop + Redis + Gephi CxP
data marts like Hadoop/Hive/HBase CxP
11. Data Prep → Modeling at Scale
Analytics jobs performed in the cloud with Hadoop, R, etc.:
• log clean-up, sessionization
• roll-ups, slices, sampling, data cubes, visualizations
• language identification, key phrase extraction
• co-occurrence analysis, topic trending
• custom search indexes
• random forests and other classifiers
• connected components, effects across social graph
• virtual economy metrics
Business use cases:
• customer segmentation edoMpUsserD:IUN
tcudorP ylppA lenaP yrotnevnI tneilC
tcudorP evomeR lenaP yrotnevnI tneilC
edoMmooRyM:IUN
edoMmooRcilbuP:IUN
• retention models
ydduB ddA
nigoL etisbeW
vd
edoMsdneirF:IUN
edoMtahC:IUN
egasseM a evaeL
G1 :gniniamer ecaps sserddA
•
dekcilCeliforPyM:IUN
anti-fraud
edoMstiderCyuB:IUN
tohspanS a ekaT
egapemoH nwO tisiV
elbbuB a epyT
taeS egnahC
wodniW D3 nepO
dneirF ddA
•
revO tcudorP pilF lenaP yrotnevnI tneilC
content recommendation
lenaP tidE
woN tahC
teP yalP
teP deeF
2 petS egaP traC esahcruP edaM remotsuC
M215 :gniniamer ecaps sserddA
gnihtolC no tuP
•
bew :metI na yuB
ad optimization
edoMeivoM:IUN
ytinummoc ,tneilc :detratS weiV eivoM
teP weN etaerC
detrats etius tset :tseTytivitcennoC
emag pazyeh dehcnuaL
eciov mooRcilbuP tahC
egasseM yadhtriB
edoMlairotuT:IUN
ybbol semag dehcnuaL
noitartsigeR euqinU
edoMpUss
tcudorP yl
tcudorP ev
edoMmoo
edoMmoo
ydduB ddA
nigoL etisb
vd
edoMsdnei
edoMtahC:
egasseM a
G1 :gninia
dekcilCelif
edoMstider
tohspanS
egapemoH
elbbuB a e
taeS egna
wodniW D
dneirF ddA
revO tcudo
lenaP tidE
woN tahC
teP yalP
teP deeF
2 petS ega
M215 :gnin
gnihtolC n
bew :metI
edoMeivo
ytinummoc
teP weN et
detrats etiu
emag pazy
eciov moo
egasseM y
edoMlairot
ybbol sem
noitartsige
12. Finding Hidden Gems…
data objects, cloud-based data access patterns business
transactions data marts use cases
Hive reporting ad-hoc queries,
RDS framework reporting
search,
Lucene / Solr cache recommenders,
data services
Hadoop
graph analysis,
Redis sessionization,
data services
MySQL
partitions
MySQL predictive modeling,
partitions
MySQL Gephi
ETL S3 social graph,
partitions factor analysis,
R time series,
data visualization
14. Analytics Team, IMVU.com
• IMVU: 90 employees in Bay Area, $40MM annual rev
• largest virtual goods catalog: +6MM items UGC
- Best Places to Work in Bay Area, 2011 & 2010
- Red Herring Global 100 Tech Startup, 2010
- Inc. 500, 2010
http://www.imvu.com/jobs/
@pacoid
Editor's Notes
• prior teams: Jive, ShareThis, Adknowledge, HeadCase\n• worked with Ray while at ShareThis on our DW and recommender systems\n• 5 years experience with AWS, some at firms 100% in the cloud\n
• I’m a big believer in asking many questions up-front…\n• this talk examines how Hadoop fits into what IMVU is famous for: continuous deployment\n• we do some critical work with large data sets which makes RDBMS not a good fit\n
• CD allows many developers to respond to immediate needs, to experiment frequently\n• transparency, measurement, and consistent data-driven decisions are absolutely requisite\n
• in short, we can handle in minutes or hours, what other firms might take days, weeks, or months to do\n• decisions and actions are highly distributed, and engineering process is well disciplined\n\n\n
• my team works in Analytics, and our data usage is at a different priority than our production cluster\n• this is generally true throughout the industry\n• business strategy depends on analytics – \n• however, analytics work tends to break what we’ve so carefully instrumented\n\n\n\n
• how did we reach this condition?\n• 1997Q4 through 1998Q1, AMZN/EBAY/GOOG/YHOO redefined data use\n• revenue/data size, as a metric, fell through the floor\n• previous practices in relational DBs and BI no longer worked so well\n
• CAP theorem explains an inherent conflict there…\n• Internet transactions tend to need different kinds of data management than analytics\n• partitioned databases are a solution for one aspect, but in turn cause ETL to become a huge problem\n
• fortunately, there are patterns we can use to engineer around those conflicts…\n• providing that you don’t buy into “one size fits all” sales rhetoric from DB vendors\n• design patterns help here: choose data frameworks which fit your data access patterns\n
• hopefully, this tables states the CAP forfeits correctly – email me corrections, please :)\n• some of these patterns migrate well to the cloud; you may miss a big opportunity if you don't\n
• Redis is notable; rich/flexible atomic operations lend to not-shared cases\n• let’s drill-down into the Hadoop use cases…\n
• here are a variety of kinds of data preparation, discovery, modeling, and visualization for which my teams have used Hadoop and AWS\n• generally the goal is to automate most all of the work, as “pipelines”, and deliver data products/data services\n• these visualizations are actually some recent products from my team (less a few details stripped out)…\n• geolocation, topic trending from text analytics, measuring effects across the social graph, and comparing features vs. retention\n
• BTW, Redis provides an excellent “left brain” to pair with Hadoop “right brain”\n• this is not strictly “real-time” analytics, but cost-effective and follows guidance from CAP\n• in other words, scalable data frameworks based on prevalent data access patterns\n
• here is some further reading, which I will post online…\n