This document discusses managing logs in the Elastic Stack, including Elasticsearch, Logstash, and Kibana. It describes how logs travel through the Elastic Stack components, from beats collecting the logs to Elasticsearch indexing and storing the data to Kibana visualizing the logs. It provides guidance on optimally sharding and indexing time-based log data in Elasticsearch, including using daily indices and index lifecycle management. It also covers scaling the cluster as data volume grows and testing to determine the optimal bulk size for indexing logs.
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Data Con LA
The Apriori Algorithm is an unsupervised learning technique for producing associative rules. This talk will explain the algorithm's implementation, explore how effective it can be when applied to big data, discuss how we use it at DataScience to do market basket analysis, and demonstrate some novel use cases involving the million song database, recipes, and other applications involving open data.
Rainbird: Realtime Analytics at Twitter (Strata 2011)Kevin Weil
Introducing Rainbird, Twitter's high volume distributed counting service for realtime analytics, built on Cassandra. This presentation looks at the motivation, design, and uses of Rainbird across Twitter.
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
Monitoring an entire application is not a simple task, but with the right tools it is not a hard task either. However, events like Black Friday can push your application to the limit, and even cause crashes. As the system is stressed, it generates a lot more logs, which may crash the monitoring system as well. In this talk I will walk through the best practices when using the Elastic Stack to centralize and monitor your logs. I will also share some tricks to help you with the huge increase of traffic typical in Black Fridays.
Presto talk @ Global AI conference 2018 Bostonkbajda
Presented at Global AI Conference in Boston 2018:
http://www.globalbigdataconference.com/boston/global-artificial-intelligence-conference-106/speaker-details/kamil-bajda-pawlikowski-62952.html
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Facebook, Airbnb, Netflix, Uber, Twitter, LinkedIn, Bloomberg, and FINRA, Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments in the last few years. Presto is really a SQL-on-Anything engine in a single query can access data from Hadoop, S3-compatible object stores, RDBMS, NoSQL and custom data stores. This talk will cover some of the best use cases for Presto, recent advancements in the project such as Cost-Based Optimizer and Geospatial functions as well as discuss the roadmap going forward.
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Data Con LA
The Apriori Algorithm is an unsupervised learning technique for producing associative rules. This talk will explain the algorithm's implementation, explore how effective it can be when applied to big data, discuss how we use it at DataScience to do market basket analysis, and demonstrate some novel use cases involving the million song database, recipes, and other applications involving open data.
Rainbird: Realtime Analytics at Twitter (Strata 2011)Kevin Weil
Introducing Rainbird, Twitter's high volume distributed counting service for realtime analytics, built on Cassandra. This presentation looks at the motivation, design, and uses of Rainbird across Twitter.
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
Monitoring an entire application is not a simple task, but with the right tools it is not a hard task either. However, events like Black Friday can push your application to the limit, and even cause crashes. As the system is stressed, it generates a lot more logs, which may crash the monitoring system as well. In this talk I will walk through the best practices when using the Elastic Stack to centralize and monitor your logs. I will also share some tricks to help you with the huge increase of traffic typical in Black Fridays.
Presto talk @ Global AI conference 2018 Bostonkbajda
Presented at Global AI Conference in Boston 2018:
http://www.globalbigdataconference.com/boston/global-artificial-intelligence-conference-106/speaker-details/kamil-bajda-pawlikowski-62952.html
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Facebook, Airbnb, Netflix, Uber, Twitter, LinkedIn, Bloomberg, and FINRA, Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments in the last few years. Presto is really a SQL-on-Anything engine in a single query can access data from Hadoop, S3-compatible object stores, RDBMS, NoSQL and custom data stores. This talk will cover some of the best use cases for Presto, recent advancements in the project such as Cost-Based Optimizer and Geospatial functions as well as discuss the roadmap going forward.
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
JanusGraph: What's Next, Project Status Update. Presented at Open Source Graph Technologies NYC Meetup on August 24, 2017. https://www.meetup.com/graphs/events/241136321/
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Demi Ben-Ari
Once you start working with distributed Big Data systems, you start discovering a whole bunch of problems you won’t find in monolithic systems.
All of a sudden to monitor all of the components becomes a big data problem itself.
In the talk we’ll mention all of the aspects that you should take in consideration when monitoring a distributed system once you’re using tools like:
Web Services, Apache Spark, Cassandra, MongoDB, Amazon Web Services.
Not only the tools, what should you monitor about the actual data that flows in the system?
And we’ll cover the simplest solution with your day to day open source tools, the surprising thing, that it comes not from an Ops Guy.
Real-time Analytics with Presto and Apache PinotXiang Fu
Presto Con 2021
In this world, most analytics products either focus on ad-hoc analytics, which requires query flexibility without guaranteed latency, or low latency analytics with limited query capability.
In this talk, we will explore how to get the best of both worlds using Apache Pinot and Presto:
1. How people do analytics today to trade-off Latency and Flexibility: Comparison over analytics on raw data vs pre-join/pre-cube dataset.
2. Introduce Apache Pinot as a column store for fast real-time data analytics and Presto Pinot Connector to cover the entire landscape.
3. Deep dive into Presto Pinot Connector to see how the connector does predicate and aggregation push down.
4. Benchmark results for Presto Pinot connector.
The JanusGraph project started at the Linux Foundation earlier this year, but it is not the new kid on the block. We'll start with a look at the origins and evolution of this open source graph database through the lens of a few IBM graph use cases. We'll discuss the new features in latest release of JanusGraph, and then take a look at future directions to explore together with the open community. Presented on October 18, 2017 at the Graph Technologies Meetup in Santa Clara, CA. https://www.meetup.com/_CAIDI/events/243122187/
PayPal prvoides an online transfer money network. Each payment flow connects senders and receivers into a giant network where each sender/receiver is a node and each transaction is an edge. Traditionally, the risk score of a transaction is computed based on the characteristics of the involved sender/receiver/transaction. In this talk, we will describe a novel network inference approach to calculate transaction risk score that also includes the risk profile of neighboring senders and receivers using Apache Giraph. The approach reveals additional risk insights not possible with the traditional method. We leverage Hadoop to support a graph computation involving hundreds of millions of nodes and edges.
Demonstrating 100 Gbps in and out of the CloudsIgor Sfiligoi
In this presentation, which was supposed to be presented at the cancelled CENIC 2020 Annual Conference, I present an overview of what is possible to achieve in terms of networking inside the Clouds and when exchanging data between cloud resources and on-prem equipment, with an emphasis on research hosted hardware.
There is increased awareness and recognition that public cloud providers do provide capabilities not found elsewhere, with elasticity being a major driver, and funding agencies are taking an increasingly positive stance toward public clouds.
The value of elastic scaling is, however, tightly coupled to the capabilities of the networks that connect all involved resources, both in the public clouds and at the various research institutions.
This presentation tries to shed some light on what is possible today.
Kiwi.com Reaches Cruising Altitude with ScyllaScyllaDB
Put your seat back and relax as Kiwi.com's Martin Strycek describes how this European travel booking giant was able to get above the clouds and make travel better with Scylla. Learn their topology, dataset, sizing, and how they continued scaling using Scylla's new SSTable format and the "bypass cache" command.
Community-Driven Graphs with JanusGraphJason Plurad
Graphs are well-suited for many use cases to express and process complex relationships among entities in enterprise and social contexts. Fueled by the growing interest in graphs, there are various graph databases and processing systems that dot the graph landscape. JanusGraph is a community-driven project that continues the legacy of Titan, a pioneer of open source graph databases. JanusGraph is a scalable graph database optimized for large scale transactional and analytical graph processing. In the session, we will introduce JanusGraph, which features full integration with the Apache TinkerPop graph stack. We will discuss JanusGraph's optimized storage model that relies on HBase for fast graph transversal and processing. Presented with Jing Chen (Jerry) He at HBaseCon West 2017, June 12, 2017.
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Nathan Bijnens
Presentation I gave at the IBM Big Data Developers meetup group in San Jose, CA.
There is also a video available of this talk at:
https://www.youtube.com/watch?v=TSt49yPBmW0&t=7m59s
Monitoring an entire application is not a simple task, but with the right tools it is not a hard task either. However, events like Black Friday can push your application to the limit, and even cause crashes. As the system is stressed, it generates a lot more logs, which may crash the monitoring system as well. In this talk I will walk through the best practices when using the Elastic Stack to centralize and monitor your logs. I will also share some tricks to help you with the huge increase of traffic typical in Black Fridays.
Topics include:
monitoring architectures
optimal bulk size
distributing the load
cluster sizing
optimizing disk IO
monitoring queries
monitoring your monitoring system :P
Takeaway: best practices when building a monitoring system with the Elastic Stack, advanced tuning to optimize and increase event ingestion performance.
Managing your Black Friday Logs NDC OsloDavid Pilato
Monitoring an entire application is not a simple task, but with the right tools it is not a hard task either. However, events like Black Friday can push your application to the limit, and even cause crashes. As the system is stressed, it generates a lot more logs, which may crash the monitoring system as well. In this talk I will walk through the best practices when using the Elastic Stack to centralize and monitor your logs. I will also share some tricks to help you with the huge increase of traffic typical in Black Fridays.
Topics include:
* monitoring architectures
* optimal bulk size
* distributing the load
* index and shard size
* optimizing disk IO
Takeaway: best practices when building a monitoring system with the Elastic Stack, advanced tuning to optimize and increase event ingestion performance.
Managing your black friday logs - Code EuropeDavid Pilato
Monitoring an entire application is not a simple task, but with the right tools it is not a hard task either. However, events like Black Friday can push your application to the limit, and even cause crashes. As the system is stressed, it generates a lot more logs, which may crash the monitoring system as well. In this talk I will walk through the best practices when using the Elastic Stack to centralize and monitor your logs. I will also share some tricks to help you with the huge increase of traffic typical in Black Fridays.
Topics include:
* monitoring architectures
* optimal bulk size
* distributing the load
* index and shard size
* optimizing disk IO
Takeaway: best practices when building a monitoring system with the Elastic Stack, advanced tuning to optimize and increase event ingestion performance.
JanusGraph: What's Next, Project Status Update. Presented at Open Source Graph Technologies NYC Meetup on August 24, 2017. https://www.meetup.com/graphs/events/241136321/
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Demi Ben-Ari
Once you start working with distributed Big Data systems, you start discovering a whole bunch of problems you won’t find in monolithic systems.
All of a sudden to monitor all of the components becomes a big data problem itself.
In the talk we’ll mention all of the aspects that you should take in consideration when monitoring a distributed system once you’re using tools like:
Web Services, Apache Spark, Cassandra, MongoDB, Amazon Web Services.
Not only the tools, what should you monitor about the actual data that flows in the system?
And we’ll cover the simplest solution with your day to day open source tools, the surprising thing, that it comes not from an Ops Guy.
Real-time Analytics with Presto and Apache PinotXiang Fu
Presto Con 2021
In this world, most analytics products either focus on ad-hoc analytics, which requires query flexibility without guaranteed latency, or low latency analytics with limited query capability.
In this talk, we will explore how to get the best of both worlds using Apache Pinot and Presto:
1. How people do analytics today to trade-off Latency and Flexibility: Comparison over analytics on raw data vs pre-join/pre-cube dataset.
2. Introduce Apache Pinot as a column store for fast real-time data analytics and Presto Pinot Connector to cover the entire landscape.
3. Deep dive into Presto Pinot Connector to see how the connector does predicate and aggregation push down.
4. Benchmark results for Presto Pinot connector.
The JanusGraph project started at the Linux Foundation earlier this year, but it is not the new kid on the block. We'll start with a look at the origins and evolution of this open source graph database through the lens of a few IBM graph use cases. We'll discuss the new features in latest release of JanusGraph, and then take a look at future directions to explore together with the open community. Presented on October 18, 2017 at the Graph Technologies Meetup in Santa Clara, CA. https://www.meetup.com/_CAIDI/events/243122187/
PayPal prvoides an online transfer money network. Each payment flow connects senders and receivers into a giant network where each sender/receiver is a node and each transaction is an edge. Traditionally, the risk score of a transaction is computed based on the characteristics of the involved sender/receiver/transaction. In this talk, we will describe a novel network inference approach to calculate transaction risk score that also includes the risk profile of neighboring senders and receivers using Apache Giraph. The approach reveals additional risk insights not possible with the traditional method. We leverage Hadoop to support a graph computation involving hundreds of millions of nodes and edges.
Demonstrating 100 Gbps in and out of the CloudsIgor Sfiligoi
In this presentation, which was supposed to be presented at the cancelled CENIC 2020 Annual Conference, I present an overview of what is possible to achieve in terms of networking inside the Clouds and when exchanging data between cloud resources and on-prem equipment, with an emphasis on research hosted hardware.
There is increased awareness and recognition that public cloud providers do provide capabilities not found elsewhere, with elasticity being a major driver, and funding agencies are taking an increasingly positive stance toward public clouds.
The value of elastic scaling is, however, tightly coupled to the capabilities of the networks that connect all involved resources, both in the public clouds and at the various research institutions.
This presentation tries to shed some light on what is possible today.
Kiwi.com Reaches Cruising Altitude with ScyllaScyllaDB
Put your seat back and relax as Kiwi.com's Martin Strycek describes how this European travel booking giant was able to get above the clouds and make travel better with Scylla. Learn their topology, dataset, sizing, and how they continued scaling using Scylla's new SSTable format and the "bypass cache" command.
Community-Driven Graphs with JanusGraphJason Plurad
Graphs are well-suited for many use cases to express and process complex relationships among entities in enterprise and social contexts. Fueled by the growing interest in graphs, there are various graph databases and processing systems that dot the graph landscape. JanusGraph is a community-driven project that continues the legacy of Titan, a pioneer of open source graph databases. JanusGraph is a scalable graph database optimized for large scale transactional and analytical graph processing. In the session, we will introduce JanusGraph, which features full integration with the Apache TinkerPop graph stack. We will discuss JanusGraph's optimized storage model that relies on HBase for fast graph transversal and processing. Presented with Jing Chen (Jerry) He at HBaseCon West 2017, June 12, 2017.
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Nathan Bijnens
Presentation I gave at the IBM Big Data Developers meetup group in San Jose, CA.
There is also a video available of this talk at:
https://www.youtube.com/watch?v=TSt49yPBmW0&t=7m59s
Monitoring an entire application is not a simple task, but with the right tools it is not a hard task either. However, events like Black Friday can push your application to the limit, and even cause crashes. As the system is stressed, it generates a lot more logs, which may crash the monitoring system as well. In this talk I will walk through the best practices when using the Elastic Stack to centralize and monitor your logs. I will also share some tricks to help you with the huge increase of traffic typical in Black Fridays.
Topics include:
monitoring architectures
optimal bulk size
distributing the load
cluster sizing
optimizing disk IO
monitoring queries
monitoring your monitoring system :P
Takeaway: best practices when building a monitoring system with the Elastic Stack, advanced tuning to optimize and increase event ingestion performance.
Managing your Black Friday Logs NDC OsloDavid Pilato
Monitoring an entire application is not a simple task, but with the right tools it is not a hard task either. However, events like Black Friday can push your application to the limit, and even cause crashes. As the system is stressed, it generates a lot more logs, which may crash the monitoring system as well. In this talk I will walk through the best practices when using the Elastic Stack to centralize and monitor your logs. I will also share some tricks to help you with the huge increase of traffic typical in Black Fridays.
Topics include:
* monitoring architectures
* optimal bulk size
* distributing the load
* index and shard size
* optimizing disk IO
Takeaway: best practices when building a monitoring system with the Elastic Stack, advanced tuning to optimize and increase event ingestion performance.
Managing your black friday logs - Code EuropeDavid Pilato
Monitoring an entire application is not a simple task, but with the right tools it is not a hard task either. However, events like Black Friday can push your application to the limit, and even cause crashes. As the system is stressed, it generates a lot more logs, which may crash the monitoring system as well. In this talk I will walk through the best practices when using the Elastic Stack to centralize and monitor your logs. I will also share some tricks to help you with the huge increase of traffic typical in Black Fridays.
Topics include:
* monitoring architectures
* optimal bulk size
* distributing the load
* index and shard size
* optimizing disk IO
Takeaway: best practices when building a monitoring system with the Elastic Stack, advanced tuning to optimize and increase event ingestion performance.
Managing your black friday logs Voxxed LuxembourgDavid Pilato
Surveiller une application complexe n'est pas une tâche aisée, mais avec les bons outils, ce n'est pas si sorcier. Néanmoins, des périodes fortes telles que les opérations de type "Black Friday" (Vendredi noir) ou période de Noël peuvent pousser votre application aux limites de ce qu'elle peut supporter, ou pire, la faire crasher. Parce que le système est fortement sollicité, il génère encore davantage de logs qui peuvent également mettre à mal votre système de supervision.
Dans cette session, j'aborderai les bonnes pratiques d'utilisation de la suite Elastic pour centraliser et monitorer vos logs. Je partagerai également avec vous quelques trucs et astuces pour vous aider à passer sans souci vos Vendredis noirs !
Nous verrons :
* Les architectures de monitoring
* Trouver la taille optimale pour l'API _bulk
* Distribuer la charge
* Taille des index et des shards
* Optimiser les E/S disque
Vous ressortirez de la session avec : des bonnes pratiques pour bâtir son système de monitoring avec la suite Elastic, le tuning avancé pour optimiser les performances d'ingestion et de recherche.
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreDataStax Academy
We will present our Office 365 use case scenarios, why we chose Cassandra + Spark, and walk through the architecture we chose for running DSE on Azure.
The presentation will feature demos on how you too can build similar applications.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2l2Rr6L.
Doug Daniels discusses the cloud-based platform they have built at DataDog and how it differs from a traditional datacenter-based analytics stack. He walks through the decisions they have made at each layer, covers the pros and cons of these decisions and discusses the tooling they have built. Filmed at qconsf.com.
Doug Daniels is a Director of Engineering at Datadog, where he works on high-scale data systems for monitoring, data science, and analytics. Prior to joining Datadog, he was CTO at Mortar Data and an architect and developer at Wireless Generation, where he designed data systems to serve more than 4 million students in 49 states.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2lGNybu.
Stefan Krawczyk discusses how his team at StitchFix use the cloud to enable over 80 data scientists to be productive. He also talks about prototyping ideas, algorithms and analyses, how they set up & keep schemas in sync between Hive, Presto, Redshift & Spark and make access easy for their data scientists, etc. Filmed at qconsf.com..
Stefan Krawczyk is Algo Dev Platform Lead at StitchFix, where he’s leading development of the algorithm development platform. He spent formative years at Stanford, LinkedIn, Nextdoor & Idibon, working on everything from growth engineering, product engineering, data engineering, to recommendation systems, NLP, data science and business intelligence.
Using a Fast Operational Database to Build Real-time Streaming AggregationsVoltDB
Simplicity, accuracy, speed; these are three things everyone wants from their data architecture. Join this webinar presented by Behzad Pirvali, Performance Architect at MaxCDN and Peter Vescuso, CMO at VoltDB to learn how MaxCDN used VoltDB, the world’s fastest operational DB with fast data pipeline, to reduce the number of managed environments by 2/3 times with 1/10th of CPU cycles required with alternative solutions. All while achieving 100% billing accuracy on 32 TB of daily web server data. The fill recording of this webinar is also available here: http://learn.voltdb.com/WRMaxCDN.html
Machine learning in the physical world by Kip Larson from AWS IoTBill Liu
Presented at AI NEXTCon Seattle 1/17-20, 2018
http://aisea18.xnextcon.com
join our free online AI group with 50,000+ tech engineers to learn and practice AI technology, including: latest AI news, tech articles/blogs, tech talks, tutorial videos, and hands-on workshop/codelabs, on machine learning, deep learning, data science, etc..
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...Databricks
Aggregation based features account for a quarter of the several 1000s features used by the ML-based decisioning system by the Risk team at Uber. We observed several repetitive, cumbersome steps needed for onboarding a feature, every single time. Therefore, to accelerate developer velocity, and to enable Feature Engineering at scale, we decided to develop a generic spark based infrastructure to simplify the process to no more than a simple spec file, containing a parameterized query, along with some metadata on where the feature should be aggregated and stored.
In the presentation, we will describe the architecture of the final solution, highlighting some of the advanced capabilities like backfill support and self-healing for correctness. We will showcase how, using data stored in Hive and using Spark, we developed a highly scalable solution to carry out feature aggregation in an incremental way. By dividing data aggregation responsibility across the realtime access layer, and the batch computation components, we ensured that only entities for which there is actual value changes are dispersed to our real-time access store (Cassandra). We will share how we did data modeling in Cassandra using its native capabilities such as counters, and how we worked around some of the limitations of Cassandra. We will also cover the details about the access service how we do different types of feature stitching together. How, based on our data model we were able to ensure that all the feature for an entity with the same aggregation window, were queried via a single query. Finally, we will cover some of the details on how these incremental aggregated features have enabled shorter turnaround times for the models using such features.
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Clustrix
Do you have a high-value, high throughput application running on AWS? Are you moving part or all of your infrastructure to AWS? Do you have a high-transaction workload that is only expected to grow as your company grows? Choosing the right database for your move to AWS can make you a hero or a goat. Be a hero!
Databases are the mission-critical lifeline of most businesses. For years MySQL has been the easy choice -- but the popularity of the cloud and new products like Aurora, RDS MySQL and ClustrixDB have given customers choices and options that can help them work smarter and more efficiently.
Enterprise Strategy Group (ESG) presents their findings from a recent performance benchmark test configured for high-transaction, low-latency workloads running on AWS.
In this webinar, you will learn:
How high-transaction, high-value database workloads perform when run on three popular databases solutions running on AWS.
How key metrics like transactions per second (tps) and database response time (latency) can affect performance and customer satisfaction.
How the ability to scale both database reads and writes is the key to unlocking performance on AWS
In this talk, we share our experience when we build up our data pipeline. We went from mongodb, and migrated to cassandra, and now we use kafka and spark to handle our data. We also talk about what problem encounter, why we select these solutions, and where we will go next.
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF LoftAmazon Web Services
Data Warehousing with Amazon Redshift: Data Analytics Week at the San Francisco Loft
A closer look at the fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. We'll show how to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution.
Level: Beginner
Speakers:
Jay Formosa - Solutions Architect, AWS
Sudhir Gupta - Partner Solutions Architect, Redshift Specialist, AWS
Glassbeam: Ad-hoc Analytics on Internet of Complex Things with Apache Cassand...DataStax Academy
Learn how Apache Cassandra can be paired with Apache Spark to build a powerful OLAP solution for analyzing large scale operational data from Internet of Complex Things. Cassandra provides a scalable, flexible, and highly available data store. Spark provides a fast and scalable analytics framework. The two can be combined to build a powerful and flexible solution for analyzing IoT data. We will discuss what kind of analytics Cassandra can handle by itself and when Spark is required. We will also share some of the challenges and trade-offs that are important to know.
by Taz Sayed, Sr Technical Account Manager AWS and Marie Yap, Enterprise Solutions Architect AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
A closer look at the fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. We'll show how to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution.
Level: Beginner
Speakers:
Jay Formosa - Solutions Architect, AWS
Aser Moustafa - Data Warehouse Specialist Solutions Architect, AWS
Data Warehousing with Amazon Redshift: Data Analytics Week SFAmazon Web Services
Data Analytics Week at the San Francisco Loft
Data Warehousing with Amazon Redshift
Asser Moustafa - Data Warehouse Specialist Solutions Architect, AWSA closer look at the fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. We'll show how to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution.
Speakers:
Jay Formosa - Solutions Architect, AWS
Asser Moustafa - Data Warehouse Specialist Solutions Architect, AWS
Similar to Pablo Musa - Managing your Black Friday Logs - Codemotion Amsterdam 2019 (20)
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
Increased complexity makes it very hard and time-consuming to keep your software bug-free and secure. We introduce fuzz-testing as a method for automatically and continuously discovering vulnerabilities hidden in your code. The talk will explain how fuzzing works and how to integrate fuzz-testing into your Software Development Life Cycle to increase your code’s security.
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
It was 1993 when we decided to venture in a beat'em up game for Amiga. The Catalypse's success story pushed me and my comrade to create something astonishing for this incredible game machine... but things went harder, assumptions were slightly different, and italian competitors appeared out of nowhere... the project died in 1996. Story ended? Probably not...
Il Commodore 65 è un prototipo di personal computer che Commodore avrebbe dovuto mettere in commercio quale successore del Commodore 64. Purtroppo la sua realizzazione si fermò appunto allo stadio prototipale. Racconterò l'affascinante storia del suo sviluppo ed il perchè della soppressione del progetto ormai ad un passo dalla immissione in commercio.
Rivivere l'ebbrezza di progettare un vecchio computer o una consolle da bar è oggi possibile sfruttando le FPGA, ovvero logiche programmabili che consentono a chiunque di progettare il proprio hardware o di ricrearne uno del passato. In questa sessione si racconta come dal reverse engineering dell'hardware di vecchie glorie come il Commodore 64 e lo ZX Spectrum sia stato possibile farle rivivere attraverso tecnologie oggi alla portata di tutti.
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
There's a lot of talk about blockchain, but how does the technology behind it actually work? For developers, getting some hands-on experience is the fastest way to get familiair with new technologies. So let's build a blockchain, then! In this session, we're going to build one in plain old Java, and have it working in 40 minutes. We'll cover key concepts of a blockchain: transactions, blocks, mining, proof-of-work, and reaching consensus in the blockchain network. After this session, you'll have a better understanding of core aspects of blockchain technology.
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
When was the last time you were truly lost? Thanks to the maps and location technology in our phones, a whole generation has now grown up in a world where getting lost is truly a thing of the past. Location technology goes far beyond maps in the palm of our hand, however. In this talk, we will explore how a ridesharing app works. How do we discover our destination?How do we find the closest driver? How do we display this information on a map? How do we find the best route?To answer these questions,we will be learning about a variety of location APIs, including Maps, Positioning, Geocoding etc.
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
Eward Driehuis, SecureLink's research chief, will guide you through the bumpy ride we call the cyber threat landscape. As the industry has over a decade of experience of dealing with increasingly sophisticated attacks, you might be surprised to hear more attacks slip through the cracks than ever. From analyzing 20.000 of them in 2018, backed by a quarter of a million security events and over ten trillion data points, Eward will outline why this happens, how attacks are changing, and why it doesn't matter how neatly or securely you code.
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
IoT revolution is ended. Thanks to hardware improvement, building an intelligent ecosystem is easier than never before for both startups and large-scale enterprises. The real challenge is now to connect, process, store and analyze data: in the cloud, but also, at the edge. We’ll give a quick look on frameworks that aggregate dispersed devices data into a single global optimized system allowing to improve operational efficiency, to predict maintenance, to track asset in real-time, to secure cloud-connected devices and much more.
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
What if Virtual Reality glasses could transform your environment into a three-dimensional work of art in realtime in the style of a painting from Van Gogh? One of the many interesting developments in the field of Deep Learning is the so called "Style Transfer". It describes a possibility to create a patchwork (or pastiche) from two images. While one of these images defines the the artistic style of the result picture, the other one is used for extracting the image content. A team from TNG Technology Consulting managed to build an AI showcase using OpenCV and Tensorflow to realize such goggles.
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
Blockchain (and Cryptocurrency) is an evolution of 20-year old research from scientists like Chaum, Lamport, and Castro & Liskov. Due to the current hype, it's hard to distinguish beneficial aspects of the technology from a desire for a "silver bullet" for device security, verifiable logistics, or "saving democracy". The problem: blockchain introduces new security challenges - and blind adoption without understanding reduces overall security. In this talk, Melanie Rieback and Klaus Kursawe explain the pitfalls and limits of blockchain, so you can avoid making your applications LESS secure.
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
Networking is a core part of computing in the digital world we inhabit. But, how well do you know how it works? Do you understand all the moving parts of the OSI stack inside your computer, and how the network is actually put together? How can this ever work? This guided safari of layers, standards, protocols, and happenstance will bring us close to the copper wire, and up through the layers of CDMA/CD, ARP, routing and HTTP. We will make a few excursions through patchworks that still work forty years later, and cleverly designed mechanisms that show that simplicity is the only way to last.
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
Performance tests are not only an important instrument for understanding a system and its runtime environment. It is also essential in order to check stability and scalability – non-functional requirements that might be decisive for success. But won't my cloud hosting service scale for me as long as I can afford it? Yes, but… It only operates and scales resources. It won't automatically make your system fast, stable and scalable. This talk shows how such and comparable questions can be clarified with performance tests and how DevOps teams benefit from regular test practise.
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
Sascha will demonstrate the opportunities and challenges of Conversational AI learned from the practice. Both Technology and User Experience will be covered introducing a process finding micro-moments, writing happy paths, gathering intents, designing the conversational flow, and finally publishing on almost all channels including Voice Services and Chatbots. Valuable for enterprises, developers, and designers. All live on stage in just minutes and with almost no code.
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion
A key challenge we face at Pacmed is quickly calibrating and deploying our tools for clinical decision support in different hospitals, where data formats may vary greatly. Using Intensive Care Units as a case study, I’ll delve into our scalable Python pipeline, which leverages Pandas’ split-apply-combine approach to perform complex feature engineering and automatic quality checks on large time-varying data, e.g. vital signs. I’ll show how we use the resulting flexible and interpretable dataframes to quickly (re)train our models to predict mortality, discharge, and medical complications.
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
Coolblue is a proud Dutch company, with a large internal development department; one that truly takes CI/CD to heart. Empowerment through automation is at the heart of these development teams, and with more than 1000 deployments a day, we think it's working out quite well. In this session, Pat Hermens (a Development Managers) will step you through what enables us to move so quickly, which tools we use, and most importantly, the mindset that is required to enable development teams to deliver at such a rapid pace.
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
Quantum computers can use all of the possible pathways generated by quantum decisions to solve problems that will forever remain intractable to classical compute power. As the mega players vie for quantum supremacy and Rigetti announces its $1M "quantum advantage" prize, we live in exciting times. IBM-Q and Microsoft Q# are two ways you can learn to program quantum computers so that you're ready when the quantum revolution comes. I'll demonstrate some quantum solutions to problems that will forever be out of reach of classical, including organic chemistry and large number factorisation.
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
Chinese food exploded across America in the early 20th century, rapidly adapting to local tastes while also spreading like wildfire. How was it able to spread so fast? The GY6 is a family of scooter engines that has achieved near total ubiquity in Europe. It is reliable and cheap to manufacture, and it's made in factories across China. How are these factories able to remain afloat? Chinese-American food and the GY6 are both riveting studies in product-market fit, and both are the product of a distributed open source-like development model. What lessons can we learn for open source software?
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
The design space has exploded in size within the last few years and Sketch is one of the most important milestones to represent the phenomenon. But behind the scenes of this growing reality there is a remote team that revolutionizes the design space all without leaving the home office. This talk will present how Sketch has grown to become a modern, product designer's tool.
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
Would you fly in a plane designed by a craftsman or would you prefer your aircraft to be designed by engineers? We are learning that science and empiricism works in software development, maybe now is the time to redefine what “Software Engineering” really means. Software isn't bridge-building, it is not car or aircraft development either, but then neither is Chemical Engineering. Engineering is different in different disciplines. Maybe it is time for us to begin thinking about retrieving the term "Software Engineering" maybe it is time to define what our "Engineering" discipline should be.
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
What is the job of a CTO and how does it change as a startup grows in size and scale? As a CTO, where should you spend your focus? As an engineer aspiring to be a CTO, what skills should you pursue? In this inspiring and personal talk, I describe my journey from early Red Hat engineer to CTO at Bloomon. I will share my view on what it means to be a CTO, and ultimately answer the question: Should the CTO be coding?
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.