Lynn Langit is an expert in data and NoSQL databases. She has industry awards from Microsoft, Google, and 10Gen and experience as a technical author, trainer, and architect. Her presentation discusses how business intelligence has evolved from optimized relational databases to incorporating big data from a variety of sources. She explains the big data pipeline of acquiring, processing, storing, querying, and visualizing data. Langit demonstrates several NoSQL database options like MongoDB, HBase, Cassandra, and Neo4j, as well as cloud storage services. She compares relational and NoSQL databases and how they apply to different types and sizes of data.
Analyzing big data is a challenge, requiring lots of processing power and storage.
Cloud Computing is an ideal platform to tackle this problem. HD Insight on Microsoft Azure deploys Hadoop and other open source big data tools to the cloud, making it easier to take advantage of the high scalability of this platform.
In this session, you will learn what tools are available in HD Insight and how to use them to store, process, and analyze large amounts of data.
Getting to 1.5M Ads/sec: How DataXu manages Big DataQubole
DataXu sits at the heart of the all-digital world, providing a data platform that manages tens of millions of dollars of digital advertising investments from Global 500 brands. The DataXu data platform evaluates 1.5 million online ad opportunities every second for our customers, allowing them to manage and optimize their marketing investments across all digital channels. DataXu employs a wide range of AWS services: Cloud Front, Cloud Trail, CloudWatch, Data Pipeline, Direct Connect, Dynamo DB, EC2, EMR, Glacier, IAM, Kinesis, RDS, Redshift, Route53, S3, SNS, SQS, and VPC to run various workloads at scale for DataXu data platform.
In addition, DataXu also uses Qubole Data Service, QDS, to offer a Unified Analytics Interface tool to DataXu customers. Qubole, a member of APN provides self-managing Big data infrastructure in the Cloud which leverages spot pricing for cost-efficiencies, provides fast performance, and most importantly a streamlined user-interface for ease of use.
Attendees will learn how Qubole provided self-managing Hadoop clusters in the AWS Cloud accelerated DataXu’s batch-oriented analysis jobs; and how Qubole integration with Amazon Redshift enabled DataXu to preform low latency and interactive analysis. Further, in the session we'll take a look at how DataXu opened up QDS access to their customers using QDS user interface thereby providing them with a single tool for both batch-oriented and interactive analysis. By using the QDS user interface buyers of the DataXu data service could perform all manner of analysis against the data stored in their AWS S3 bucket.
Speakers:
Scott Ward
Solutions Architect at Amazon Web Services
Ashish Dubey
Solutions Architect at Qubole
Yekesa Kosuru
VP Engineering at DataXu
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleVasu S
This ebook deep dives into Apache Spark optimizations that improve performance, reduce costs and deliver unmatched scale
https://www.qubole.com/resources/ebooks/accelerating-time-to-value-of-big-data-of-apache-spark
Big Data - in the cloud or rather on-premises?Guido Schmutz
You want to implement an Big Data/IoT solution and would like to know if it should be implemented in the cloud or on-premises. You are interested in the cloud offerings of vendors and what benefits they provide and if a similar solution would not be possible on-premises.
This presentation deals with this and other questions. Starting from an vendor-independent reference architecture and corresponding design patterns, different cloud solutions from various vendors are compared and rated. Additionally it will be shown how such solution could be implemented on-premises and how a hybrid Big Data/IoT solution could look like.
Analyzing big data is a challenge, requiring lots of processing power and storage.
Cloud Computing is an ideal platform to tackle this problem. HD Insight on Microsoft Azure deploys Hadoop and other open source big data tools to the cloud, making it easier to take advantage of the high scalability of this platform.
In this session, you will learn what tools are available in HD Insight and how to use them to store, process, and analyze large amounts of data.
Getting to 1.5M Ads/sec: How DataXu manages Big DataQubole
DataXu sits at the heart of the all-digital world, providing a data platform that manages tens of millions of dollars of digital advertising investments from Global 500 brands. The DataXu data platform evaluates 1.5 million online ad opportunities every second for our customers, allowing them to manage and optimize their marketing investments across all digital channels. DataXu employs a wide range of AWS services: Cloud Front, Cloud Trail, CloudWatch, Data Pipeline, Direct Connect, Dynamo DB, EC2, EMR, Glacier, IAM, Kinesis, RDS, Redshift, Route53, S3, SNS, SQS, and VPC to run various workloads at scale for DataXu data platform.
In addition, DataXu also uses Qubole Data Service, QDS, to offer a Unified Analytics Interface tool to DataXu customers. Qubole, a member of APN provides self-managing Big data infrastructure in the Cloud which leverages spot pricing for cost-efficiencies, provides fast performance, and most importantly a streamlined user-interface for ease of use.
Attendees will learn how Qubole provided self-managing Hadoop clusters in the AWS Cloud accelerated DataXu’s batch-oriented analysis jobs; and how Qubole integration with Amazon Redshift enabled DataXu to preform low latency and interactive analysis. Further, in the session we'll take a look at how DataXu opened up QDS access to their customers using QDS user interface thereby providing them with a single tool for both batch-oriented and interactive analysis. By using the QDS user interface buyers of the DataXu data service could perform all manner of analysis against the data stored in their AWS S3 bucket.
Speakers:
Scott Ward
Solutions Architect at Amazon Web Services
Ashish Dubey
Solutions Architect at Qubole
Yekesa Kosuru
VP Engineering at DataXu
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleVasu S
This ebook deep dives into Apache Spark optimizations that improve performance, reduce costs and deliver unmatched scale
https://www.qubole.com/resources/ebooks/accelerating-time-to-value-of-big-data-of-apache-spark
Big Data - in the cloud or rather on-premises?Guido Schmutz
You want to implement an Big Data/IoT solution and would like to know if it should be implemented in the cloud or on-premises. You are interested in the cloud offerings of vendors and what benefits they provide and if a similar solution would not be possible on-premises.
This presentation deals with this and other questions. Starting from an vendor-independent reference architecture and corresponding design patterns, different cloud solutions from various vendors are compared and rated. Additionally it will be shown how such solution could be implemented on-premises and how a hybrid Big Data/IoT solution could look like.
Many organizations focus on the licensing cost of Hadoop when considering migrating to a cloud platform. But other costs should be considered, as well as the biggest impact, which is the benefit of having a modern analytics platform that can handle all of your use cases. This session will cover lessons learned in assisting hundreds of companies to migrate from Hadoop to Databricks.
Data Con LA 2020
Description
In this session, I introduce the Amazon Redshift lake house architecture which enables you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights. With a lake house architecture, you can store data in open file formats in your Amazon S3 data lake.
Speaker
Antje Barth, Amazon Web Services, Sr. Developer Advocate, AI and Machine Learning
Spark is fast and general engine for large-scale data processing which can solve all of your problems.
… Or can it?
This talk will cover real-world issues encountered during migration of the existing product to Spark infrastructure.
Aimed at software engineers that just started to evaluate Spark or those who are already using it.
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Amazon Web Services
AWS has a large and growing portfolio of big data management and analytics services, designed to be integrated into solution architectures that meet the needs of your business. In this session, we look at analytics through the eyes of a business intelligence analyst, a data scientist, and an application developer, and we explore how to quickly leverage Amazon Redshift, Amazon QuickSight, RStudio, and Amazon Machine Learning to create powerful, yet straightforward, business solutions.
This session is recommended for anyone interested in understanding how to use AWS big data services to develop real-time analytics applications. In this session, you will get an overview of a number of Amazon's big data and analytics services that enable you to build highly scaleable cloud applications that immediately and continuously analyze large sets of distributed data. We'll explain how services like Amazon Kinesis, EMR and Redshift can be used for data ingestion, processing and storage to enable real-time insights and analysis into customer, operational and machine generated data and log files. We'll explore system requirements, design considerations, and walk through a specific customer use case to illustrate the power of real-time insights on their business.
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
This presentation provides a survey of the advanced analytics strengths of Microsoft Azure from an enterprise perspective (with these organizations being the bulk of big data users) based on the Team Data Science Process. The talk also covers the range of analytics and advanced analytics solutions available for developers using data science and artificial intelligence from Microsoft Azure.
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloudJeff Hung
Trend Micro has been running big-data in on-premises data center for many years. With Hadoop and its mature ecosystem, we are able to build the centralized Data Lake to serve and fulfill massive data processing loads while manage and encourage new use of data.
In recent years, we are shifting our focus to AWS. Due to the decentralized nature of the cloud, the design and thinking for building Data Lake are different. We must identify what are still important no matter in on-prem or on the cloud, and what could be done differently to embrace the cloud model.
In this talk, we will elaborate Trend Micro considerations and best practices on building Data Lake in on-prem and on cloud. And share our experience on managing peta-byte scale data with many years of evolution.
Data Engineer's Lunch #55: Get Started in Data EngineeringAnant Corporation
In Data Engineer's Lunch #55, CEO of Anant, Rahul Singh, will cover 10 resources every data engineer needs to get started or master their game.
Accompanying Blog: Coming Soon!
Accompanying YouTube: Coming Soon!
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Data Engineer’s Lunch Weekly at 12 PM EST Every Monday:
https://www.meetup.com/Data-Wranglers-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
This session will cover building the modern Data Warehouse by migration from the traditional DW platform into the cloud, using Amazon Redshift and Cloud ETL Matillion in order to provide Self-Service BI for the business audience. This topic will cover the technical migration path of DW with PL/SQL ETL to the Amazon Redshift via Matillion ETL, with a detailed comparison of modern ETL tools. Moreover, this talk will be focusing on working backward through the process, i.e. starting from the business audience and their needs that drive changes in the old DW. Finally, this talk will cover the idea of self-service BI, and the author will share a step-by-step plan for building an efficient self-service environment using modern BI platform Tableau.
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Mark Rittman
Hadoop and NoSQL platforms initially focused on Java developers and slow but massively-scalable MapReduce jobs as an alternative to high-end but limited-scale analytics RDBMS engines. Apache Hive opened-up Hadoop to non-programmers by adding a SQL query engine and relational-style metadata layered over raw HDFS storage, and since then open-source initiatives such as Hive Stinger, Cloudera Impala and Apache Drill along with proprietary solutions from closed-source vendors have extended SQL-on-Hadoop’s capabilities into areas such as low-latency ad-hoc queries, ACID-compliant transactions and schema-less data discovery – at massive scale and with compelling economics.
In this session we’ll focus on technical foundations around SQL-on-Hadoop, first reviewing the basic platform Apache Hive provides and then looking in more detail at how ad-hoc querying, ACID-compliant transactions and data discovery engines work along with more specialised underlying storage that each now work best with – and we’ll take a look to the future to see how SQL querying, data integration and analytics are likely to come together in the next five years to make Hadoop the default platform running mixed old-world/new-world analytics workloads.
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Jeff Hung
It is a common believe that Hadoop should run on physical servers. However, this requires huge capital investment in the beginning while you have no guarantee for the returns. Therefore, things usually end up in proving big-data with not-that-big data. One approach to workaround this dilemma is to run Cloud Computing in the Cloud. With the elastic that AWS provides, you could spend little but run big!! However, is it really a good idea? In this sharing, we will try to answer it – based on the result from an 1-year journey with real application and real big-data.
What exactly is big data? The definition of big data is data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the three Vs. Put simply, big data is larger, more complex data sets, especially from new data sources.
Many organizations focus on the licensing cost of Hadoop when considering migrating to a cloud platform. But other costs should be considered, as well as the biggest impact, which is the benefit of having a modern analytics platform that can handle all of your use cases. This session will cover lessons learned in assisting hundreds of companies to migrate from Hadoop to Databricks.
Data Con LA 2020
Description
In this session, I introduce the Amazon Redshift lake house architecture which enables you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights. With a lake house architecture, you can store data in open file formats in your Amazon S3 data lake.
Speaker
Antje Barth, Amazon Web Services, Sr. Developer Advocate, AI and Machine Learning
Spark is fast and general engine for large-scale data processing which can solve all of your problems.
… Or can it?
This talk will cover real-world issues encountered during migration of the existing product to Spark infrastructure.
Aimed at software engineers that just started to evaluate Spark or those who are already using it.
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Amazon Web Services
AWS has a large and growing portfolio of big data management and analytics services, designed to be integrated into solution architectures that meet the needs of your business. In this session, we look at analytics through the eyes of a business intelligence analyst, a data scientist, and an application developer, and we explore how to quickly leverage Amazon Redshift, Amazon QuickSight, RStudio, and Amazon Machine Learning to create powerful, yet straightforward, business solutions.
This session is recommended for anyone interested in understanding how to use AWS big data services to develop real-time analytics applications. In this session, you will get an overview of a number of Amazon's big data and analytics services that enable you to build highly scaleable cloud applications that immediately and continuously analyze large sets of distributed data. We'll explain how services like Amazon Kinesis, EMR and Redshift can be used for data ingestion, processing and storage to enable real-time insights and analysis into customer, operational and machine generated data and log files. We'll explore system requirements, design considerations, and walk through a specific customer use case to illustrate the power of real-time insights on their business.
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
This presentation provides a survey of the advanced analytics strengths of Microsoft Azure from an enterprise perspective (with these organizations being the bulk of big data users) based on the Team Data Science Process. The talk also covers the range of analytics and advanced analytics solutions available for developers using data science and artificial intelligence from Microsoft Azure.
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloudJeff Hung
Trend Micro has been running big-data in on-premises data center for many years. With Hadoop and its mature ecosystem, we are able to build the centralized Data Lake to serve and fulfill massive data processing loads while manage and encourage new use of data.
In recent years, we are shifting our focus to AWS. Due to the decentralized nature of the cloud, the design and thinking for building Data Lake are different. We must identify what are still important no matter in on-prem or on the cloud, and what could be done differently to embrace the cloud model.
In this talk, we will elaborate Trend Micro considerations and best practices on building Data Lake in on-prem and on cloud. And share our experience on managing peta-byte scale data with many years of evolution.
Data Engineer's Lunch #55: Get Started in Data EngineeringAnant Corporation
In Data Engineer's Lunch #55, CEO of Anant, Rahul Singh, will cover 10 resources every data engineer needs to get started or master their game.
Accompanying Blog: Coming Soon!
Accompanying YouTube: Coming Soon!
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Data Engineer’s Lunch Weekly at 12 PM EST Every Monday:
https://www.meetup.com/Data-Wranglers-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
This session will cover building the modern Data Warehouse by migration from the traditional DW platform into the cloud, using Amazon Redshift and Cloud ETL Matillion in order to provide Self-Service BI for the business audience. This topic will cover the technical migration path of DW with PL/SQL ETL to the Amazon Redshift via Matillion ETL, with a detailed comparison of modern ETL tools. Moreover, this talk will be focusing on working backward through the process, i.e. starting from the business audience and their needs that drive changes in the old DW. Finally, this talk will cover the idea of self-service BI, and the author will share a step-by-step plan for building an efficient self-service environment using modern BI platform Tableau.
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Mark Rittman
Hadoop and NoSQL platforms initially focused on Java developers and slow but massively-scalable MapReduce jobs as an alternative to high-end but limited-scale analytics RDBMS engines. Apache Hive opened-up Hadoop to non-programmers by adding a SQL query engine and relational-style metadata layered over raw HDFS storage, and since then open-source initiatives such as Hive Stinger, Cloudera Impala and Apache Drill along with proprietary solutions from closed-source vendors have extended SQL-on-Hadoop’s capabilities into areas such as low-latency ad-hoc queries, ACID-compliant transactions and schema-less data discovery – at massive scale and with compelling economics.
In this session we’ll focus on technical foundations around SQL-on-Hadoop, first reviewing the basic platform Apache Hive provides and then looking in more detail at how ad-hoc querying, ACID-compliant transactions and data discovery engines work along with more specialised underlying storage that each now work best with – and we’ll take a look to the future to see how SQL querying, data integration and analytics are likely to come together in the next five years to make Hadoop the default platform running mixed old-world/new-world analytics workloads.
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Jeff Hung
It is a common believe that Hadoop should run on physical servers. However, this requires huge capital investment in the beginning while you have no guarantee for the returns. Therefore, things usually end up in proving big-data with not-that-big data. One approach to workaround this dilemma is to run Cloud Computing in the Cloud. With the elastic that AWS provides, you could spend little but run big!! However, is it really a good idea? In this sharing, we will try to answer it – based on the result from an 1-year journey with real application and real big-data.
What exactly is big data? The definition of big data is data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the three Vs. Put simply, big data is larger, more complex data sets, especially from new data sources.
Extract business value by analyzing large volumes of multi-structured data from various sources such as databases, websites, blogs, social media, smart sensors...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
Most DBAs are aware something interesting is going on with big data and the Hadoop product ecosystem that underpins it, but aren't so clear about what each component in the stack does, what problem each part solves and why those problems couldn't be solved using the old approach. We'll look at where it's all going with the advent of Spark and machine learning, what's happening with ETL, metadata and analytics on this platform ... why IaaS and datawarehousing-as-a-service will have such a big impact, sooner than you think
Big Data brings big promise and also big challenges, the primary and most important one being the ability to deliver Value to business stakeholders who are not data scientists!
NoSQL databases like MongoDB, Elasticsearch, and Cassandra are synonymous with scalability, search, and developer agility. But there’s a downside...having to give up the ease and comfort of SQL.
Or do you?
Join this webcast to learn how the newest databases, like CrateDB and CockroachDB deliver the benefits of NoSQL with the ease of SQL by building SQL engines on top of custom NoSQL technology stacks. Database industry veteran Andy Ellicott, who helped launch Vertica, VoltDB, Cloudant, and now with Crate.io, will provide a no-BS view of current DBMS architectures and predictions for the future of data.
If you’re a DBMS user, this webcast will help you make sense of a very crowded DBMS market and make better-informed decisions for your new tech stacks.
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
Cloud Computing Evolution
Why Cloud Computing needed?
Cloud Computing Models
Cloud Solutions
Cloud Jobs opportunities
Criteria for Big Data
Big Data challenges
Technologies to process Big Data- Hadoop
Hadoop History and Architecture
Hadoop Eco-System
Hadoop Real-time Use cases
Hadoop Job opportunities
Hadoop and SAP HANA integration
Summary
Slides used for the keynote at the even Big Data & Data Science http://eventos.citius.usc.es/bigdata/
Some slides are borrowed from random hadoop/big data presentations
The Common BI/Big Data Challenges and Solutions presented by seasoned experts, Andriy Zabavskyy (BI Architect) and Serhiy Haziyev (Director of Software Architecture).
This was a complimentary workshop where attendees had the opportunity to learn, network and share knowledge during the lunch and education session.
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
Big Data projects are a struggle, not only on the technical side but also on the organizational side. In this talk the author shares his experience and opinions from almost 5 years of Big Data projects and develops an Agile Big Data Model which reflects his ideas on how Big Data projects can be successful, even in large companies.
Talk held at the crossover meetup of the "Agile Stammtisch Rhein-Main" and the "Hadoop & Spark User Group Rhein-Main" at codecentric AG on 31.01.2017.
MongoDB & Hadoop - Understanding Your Big DataMongoDB
Big Data is the evolution of supercomputing for commercial enterprise and governments. Originally the domain of companies operating at Internet scale, today Big Data connects organizations of all sizes with discovery about their patterns, and insights into their business.
But understanding the differences between the plethora of new technologies can be daunting. Graph / columnar / key value store / document are all called NoSQL, but which is best? How does Hadoop play in this ecosystem - its low cost and high efficiency have made it very popular, but how does it fit?
In this webinar, we will explore:
The full spectrum of Big Data
Hadoop and MongoDB: friends or frenemies?
Differences between Systems of Record and Systems of Engagement
MongoDB customer examples of Systems of Engagement
deck from talk at YOW Data in Sydney, covers VariantSpark, custom Apache Spark Machine Learning library and also GT-Scan2 using AWS Lambda architecture for bioinformatics
VariantSpark - a Spark library for genomicsLynn Langit
VariantSpark a customer Apache Spark library for genomic data. Customer wide random forest machine learning algorithm, designed for workloads with millions of features.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Essentials of Automations: Optimizing FME Workflows with Parameters
NoSQL for the SQL Server Pro
1. NoSQL for the DBA
Lynn Langit
April 2013 – Big Data Tech Con
2. Data Expertise / Lynn Langit
• Industry awards
– Microsoft – MVP for SQL Server
– Google – GDE for Cloud Platform
– 10Gen – Master for MongoDB
• Practicing Architect
• Technical author / trainer
– Pluralsight – Google Cloud Series
– DevelopMentor – SQL Server Series
– 2 books on SQL Server BI
– Cloudera trainer (certified)
• Former MSFT FTE
– 4 years
14. Big Data – an example from weather
• Source Data
• National weather data
• Satellite data
• Airplanes with sensors
• Sensors on boats
• Sensors in the ocean
• Sensors on the ground
• Historical Data
• Social Media
• Results
• More accurate predictions
• Tsunami
• Tornado
15. Big Data – an example from health care
• Medical records
• Regular
• Emergency
• Genetic data – 23andMe
• Food data
• SparkPeople
• Purchasing
• Grocery card
• credit card
• Search – Google
• Social media
• Twitter
• Facebook
• Exercise
• Nike Fuel Band
• Kinect
• Location - phone
16. BigData = ‘Next State’ Questions
• What could happen?
Collecting • Why didn’t this happen?
• When will the next new thing
Behavioral happen?
data • What will the next new thing be?
• What happens?
17. What is the reality of personalized medicine?
2500
2000 Key Monitoring
1500
Sensor Readings
1000
500
Other Behavioral
0 data
12:00 12:30 1:00 1:30 2:00 2:30
19. Collecting BigData
• Sensors everywhere
• Structured, Semi-structured, Unstructured vs. Data
Standards
• M2M
• Public Datasets
– Freebase
– Azure DataMarket
– Hillary Mason’s list
19
20. DEMO – Hilary Mason’s Datasets
• Who is Hilary Mason and why do you care
about her datasets?
• How do you get her datasets?
• What do you do with her datasets?
21. Collecting Data – a note about Faces
• Facial recognition
• Voice recognition
• Gesture capture and analysis
21
24. Big Data in India
Update: “The total number of AADHAARs issued as of 24-Mar-
2013 is over 304 million. This is more than 25% of the
population of India.”
25. BigData Pipeline – STEP 5 - Visualize
Acquire
Process
Store
Query & Mine
Visualize
28. BigData Pipeline – STEP 2 - Process
Acquire
Process
Store
Query & Mine
Visualize
29. How do you clean up the mess?
• Data Hygiene
• Data Scrubbing
• Data Sprawl
• The true cost of data
• …and what about data integrity?
• …and security?
• …should your data be in the cloud?
30. Is NoSQL just Hadoop?
HUGE Hype factor since 2011
Apache Hadoop
• a software framework that supports data-intensive distributed
applications
• under a free license enables applications to work with thousands of
nodes and petabytes of data
• was inspired by Google's MapReduce and Google File System (GFS)
papers
31. What is the relationship?
NoSQL Hadoop ??? BigData
39. Example Comparison: RDBMS vs. Hadoop
Traditional RDBMS Hadoop / MapReduce
Data Size Gigabytes (Terabytes) Petabytes and greater
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query Can be near immediate Has latency (due to batch
Response processing)
Time
55. Graph Databases
• a lot of many-to-many relationships
• recursive self-joins
• when your primary objective is quickly
finding connections, patterns and
relationships between the objects
within lots of data
• Examples:
– Neo4J
– Google Freebase
63. NoSQL Applied
Columnstore Log Files
HBase
Key/Value
Product Catalogs
DynamoDB
Document
Social Games
MongoDB
Graph
Social aggregators
Neo4j
RDBMS
Line-of-Business
SQL Server
64. Cloud Offerings– RDBMS AND NoSQL
AWS Google Microsoft
RDBMS RDS – all major mySQL SQL Azure
NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs
NoSQL Key-Value DynamoDB H/R Data on GAE Azure Tables
Streaming ML or Custom EC2 Prospective Search StreamInsight
(Mahout) &
Prediction API
NoSQL Document or MongoDB on EC2 Freebase MongoDB on
Graph Windows Azure
NoSQL – Column Elastic MapReduce none HDInsight
Hadoop (HBase) using S3 & EC2
Dremel/Warehousi RedShift BigQuery none
ng
69. Can Excel help?
• Connector to Hadoop
• Data Explorer
• Data Quality Services
• Master Data Services
• Integration with Azure Data Market
• Visualize with PowerView
• Data Mining w/Predixion
74. Other types of cloud data services
Hosting public datasets Cleaning / matching (your)
• Pay to read data
• Earn revenue by offering for read • ETL – Microsoft Data
Explorer, Google Refine
• Data Quality – Windows Azure
Data
Market, InfoChimps, DataMarket
.com
75. NoSQL To-Do List
Understand CAP & types of NoSQL databases
• Use NoSQL when business needs designate
• Use the right type of NoSQL for your business problem
Try out NoSQL on the cloud
• Quick and cheap for behavioral data
• Mashup cloud datasets
• Good for specialized use cases, i.e. dev, test , training environments
Learn noSQL access technologies
• New query languages, i.e. MapReduce, R, Infer.NET
• New query tools (vendor-specific) – Google Refine, Amazon
Karmasphere, Microsoft Excel connectors, etc…
77. • recipes)
www.TeachingKidsProgramming.org
• Free Courseware (
• Do a Recipe Teach a Kid (Ages 10 ++)
• Java or Microsoft SmallBasic TKP site
• C# via Pluralsight
78. Toward Data Craftsmanship…
Follow me @LynnLangit
RSS my blog
www.LynnLangit.com
Hire me
• To help build your BI/Big Data solution
• To teach your team next gen BI
• To learn more about using NoSQL solutions
Editor's Notes
From the O’Reilly / Strata “Getting Ready for Big Data” Report…“the three Vs of volume, velocity and variety are commonlyused to characterize different aspects of big data”
http://hortonworks.com/technology/hortonworksdataplatform/More about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report“Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google’s BigTable, the project’s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase.In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.”http://www.cloudera.com/
http://hortonworks.com/technology/hortonworksdataplatform/More about Hbase, from the O’Reilly ‘Getting Ready for BigData’ report“Enter HBase, a column-oriented database that runs on top of HDFS. Modeled after Google’s BigTable, the project’s goal is to host billions of rows of data for rapid access. MapReduce can use HBase as both a source and a destination for its computations, and Hive and Pig can be used in combination with HBase.In order to grant random access to the data, HBase does impose a few restrictions: performance with Hive is 4-5 times slower than plain HDFS, and the maximum amount of data you can store is approximately a petabyte, versus HDFS’ limit of over 30PB.”http://www.cloudera.com/
https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
Original Reference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience)
http://nosql-database.org/http://hadoop.apache.org/ & http://www.mongodb.org/Wikipedia - http://en.wikipedia.org/wiki/NoSQLList of noSQL databases – http://nosql-database.org/The good, the bad - http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772
http://code.google.comAccess via REST APIsVery Cheap, but not much functionality includedLots of code to write for application developmentBut…can be a good backup solution
http://berb.github.com/diploma-thesis/original/061_challenge.htmlhttp://nosqltips.blogspot.com/2011/04/cap-diagram-for-distribution.htmlhttp://blog.mccrory.me/2010/11/03/cap-theorem-and-the-clouds/http://amitpiplani.blogspot.com/2010/05/u-pick-2-selection-for-nosql-providers.htmlACID VS BASEACIDRDBMS are the predominant database systems currently in use for most applications, including web applications. Their data model and their internals are strongly connected to transactional behaviour when operating on data. However, transactional behaviour is not solely related to RDBMS, but is also used for other systems. A set of properties describes the guarantees that database transactions generally provide in order to maintain the validity of data [Hae83].AtomicityThis property determines that a transaction executes with a "all or nothing" manner. A transaction can either be a single operation, or a sequence of operations resp. sub-transactions. As a result, a transaction either succeeds and the database state is changed, or the database remains unchanged, and all operations are dismissed.ConsistencyIn context of transactions, we define consistency as the transition from one valid state to another, never leaving behind an invalid state when executing transactions.IsolationThe concept of isolation guarantees that no transaction sees premature operations of other running transactions. In essence, this prevents conflicts between concurrent transactions.DurabilityThe durability property assures persistence of executed transactions. Once a transaction has committed, the outcome of the transaction such as state changes are kept, even in case of a crash or other failures.Strongly adhering to the principles of ACID results in an execution order that has the same effect as a purely serial execution. In other words, there is always a serially equivalent order of transactions that represents the exact same state [Dol05]. It is obvious that ensuring a serializable order negatively affects performance and concurrency, even when a single machine is used. In fact, some of the properties are often relaxed to a certain extent in order to improve performance. A weaker isolation level between transactions is the most used mechanism to speed up transactions and their throughput. Stepwise, a transactional system can leave serializablity and fall back to the weaker isolation levels repeatable reads, read committed and read uncommitted. These levels gradually remove range locks, read locks and resp. write locks (in that order). As a result, concurrent transactions are less isolated and can see partial results of other transactions, yielding so called read phenomena. Some implementations also weaken the durability property by not guaranteeing to write directly to disk. Instead, committed states are buffered in memory and eventually flushed to disk. This heavily decreases latencies at the cost of data integrity.Consistency is still a core property of the ACID model, that cannot be relaxed easily. The mutual dependencies of the properties make it impossible to remove a single property without affecting the others. Referring back to the CAP theorem, we have seen the trade-off between consistency and availability regarding distributed database systems that must tolerate partitions. In case we choose a database system that follows the ACID paradigm, we cannot guarantee high availability anymore. The usage of ACID as part of a distributed systems yields the need of distributed transactions or similar mechanisms for preserving the transactional properties when state is shared and sharded onto multiple nodes.Now let us reconsider what would happen if we evict the burden of distributed transactions. As we are talking about distributed systems, we have no global shared state by default. The only knowledge we have is a per-node knowledge of its own past. Having no global time, no global now, we cannot inherently have atomic operations on system level, as operations occur at different times on different machines. This softens isolation and we must leave the notion of global state for now. Having no immediate, global state of the system in turn endangers durability.In conclusion, building distributed systems adhering to the ACID paradigm is a demanding challenge. It requires complex coordination and synchronization efforts between all involved nodes, and generates considerable communication overhead within the entire system. It is not for nothing that distributed transactions are feared by many architects [Hel09,Alv11]. Although it is possible to build such systems, some of the original motivations for using a distributed database system have been mitigated on this path. Isolation and serializablity contradict scalability and concurrency. Therefore, we will now consider an alternative model for consistency that sacrifices consistency for other properties that are interesting for certain systems.BASEThis alternative consistency model is basically the subsumption of properties resulting from a system that provides availability and partition tolerance, but no strong consistency [Pri08]. While a strong consistency model as provided by ACID implies that all subsequent reads after a write yield the new, updated state for all clients and on all nodes of the system, this is weakened for BASE. Instead, the weak consistency of BASE comes up with an inconsistency window, a period in which the propagation oft the update is not yet guaranteed.Basically availableThe availability of the system even in case of failures represents a strong feature of the BASE model.Soft stateNo strong consistency is provided and clients must accept stale state under certain circumstances.Eventually consistentConsistency is provided in a "best effort" manner, yielding a consistent state as soon as possible.The optimistic behaviour of BASE represents a best effort approach towards consistency, but is also said to be simpler and faster. Availability and scaling capacities are primary objectives at the expense of consistency. This has an impact on the application logic, as the developer must be aware of the possibility of stale data. On the other hand, favoring availability over consistency has also benefits for the application logic in some scenarios. For instance, a partial system failure of an ACID system might reject write operations, forcing the application to handle the data to be written somehow. A BASE system in turn might always accept writes, even in case of network failures, but they might not be visible for other nodes immediately. The applicability of relaxed consistency models depends very much on the application requirements. Strict constraints of balanced accounts for a banking application do not fit eventual consistency naturally. But many web applications built around social objects and user interactions can actually accept slightly stale data. When the inconsistency window is on average smaller than the time between request/response cycles of user interactions, a user might not even realize any kind of inconsistency at all.http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
For Google - http://code.google.comFor AWS - https://console.aws.amazon.com/console/home
Hadoop on AWS - http://wiki.apache.org/hadoop/AmazonEC2
About Data Science -- http://www.romymisra.com/the-new-job-market-rulers-data-scientists/R language - http://www.r-project.org/Infer.NET - http://research.microsoft.com/en-us/um/cambridge/projects/infernet/Julia language -- http://julialang.org/There are a plethora of languages to access, manipulate and process bigData. These languages fall into a couple of categories:RESTful – simple, standardsETL – Pig (Hadoop) is an exampleQuery – Hive (again Hadoop), lots of *QLAnalyze – R, Mahout, Infer.NET, DMX, etc.. Applying statistical (data-mining) algorithms to the data output