All Things Cloud Developer Meetup.
Filtering From the Firehose: Real Time Social Media Streaming with Jim Moffitt from Gnip. Gnip is the world's largest and most trusted provider of social data.
Learn about collecting and filtering social media data with streaming APIs. Jim will cover best practices, use case examples and live demos of filtering data from Twitter.
Presentation given by Sungwook Yoon, MapR Data Scientist
Topics Covered:
Advanced Persistent Threat (APT)
Big Data + Threat Intelligence
Hadoop + Spark Solution
Example Detection Algorithm Development Scenarios (most of them are still open problems)
44CON 2014: Using hadoop for malware, network, forensics and log analysisMichael Boman
The number of new malware samples are over a hundred thousand a day, network speeds are measured in multiple of ten gigabits per second, computer systems have terabytes of storage and the log files are just piling up. By using Hadoop you can tackle these problems in a whole different way, and “Too Much Data to Process” will be a thing of the past.
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
Architecting R into the Storm Application Development Process
~~~~~
The business need for real-time analytics at large scale has focused attention on the use of Apache Storm, but an approach that is sometimes overlooked is the use of Storm and R together. This novel combination of real-time processing with Storm and the practical but powerful statistical analysis offered by R substantially extends the usefulness of Storm as a solution to a variety of business critical problems. By architecting R into the Storm application development process, Storm developers can be much more effective. The aim of this design is not necessarily to deploy faster code but rather to deploy code faster. Just a few lines of R code can be used in place of lengthy Storm code for the purpose of early exploration – you can easily evaluate alternative approaches and quickly make a working prototype.
In this presentation, Allen will build a bridge from basic real-time business goals to the technical design of solutions. We will take an example of a real-world use case, compose an implementation of the use case as Storm components (spouts, bolts, etc.) and highlight how R can be an effective tool in prototyping a solution.
Python has been one of the premier, flexible, and powerful open-source language that is easy to learn, easy to use, and has powerful libraries for data manipulation and analysis. It’s easy to learn simple syntax is very accessible to new programmers and is similar to Matlab, C/C++, Java, or Visual Basic. Python is general purpose and comparatively easy to learn with an increased adoption for analytical and quantitative computing. For over a decade, Python has been used in scientific computing and highly quantitative domains such as finance, oil and gas, physics, and signal processing.
Presentation given by Sungwook Yoon, MapR Data Scientist
Topics Covered:
Advanced Persistent Threat (APT)
Big Data + Threat Intelligence
Hadoop + Spark Solution
Example Detection Algorithm Development Scenarios (most of them are still open problems)
44CON 2014: Using hadoop for malware, network, forensics and log analysisMichael Boman
The number of new malware samples are over a hundred thousand a day, network speeds are measured in multiple of ten gigabits per second, computer systems have terabytes of storage and the log files are just piling up. By using Hadoop you can tackle these problems in a whole different way, and “Too Much Data to Process” will be a thing of the past.
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
Architecting R into the Storm Application Development Process
~~~~~
The business need for real-time analytics at large scale has focused attention on the use of Apache Storm, but an approach that is sometimes overlooked is the use of Storm and R together. This novel combination of real-time processing with Storm and the practical but powerful statistical analysis offered by R substantially extends the usefulness of Storm as a solution to a variety of business critical problems. By architecting R into the Storm application development process, Storm developers can be much more effective. The aim of this design is not necessarily to deploy faster code but rather to deploy code faster. Just a few lines of R code can be used in place of lengthy Storm code for the purpose of early exploration – you can easily evaluate alternative approaches and quickly make a working prototype.
In this presentation, Allen will build a bridge from basic real-time business goals to the technical design of solutions. We will take an example of a real-world use case, compose an implementation of the use case as Storm components (spouts, bolts, etc.) and highlight how R can be an effective tool in prototyping a solution.
Python has been one of the premier, flexible, and powerful open-source language that is easy to learn, easy to use, and has powerful libraries for data manipulation and analysis. It’s easy to learn simple syntax is very accessible to new programmers and is similar to Matlab, C/C++, Java, or Visual Basic. Python is general purpose and comparatively easy to learn with an increased adoption for analytical and quantitative computing. For over a decade, Python has been used in scientific computing and highly quantitative domains such as finance, oil and gas, physics, and signal processing.
Social Security Company Nexgate's Success Relies on Apache CassandraDataStax Academy
The accuracy of any security product is directly tied to the breadth of the corpus of data upon which it is built. For Nexgate, this means that the success of our products is inextricably tied to our ability to save everything we've ever scanned forever, but in a way that is still readily accessible. In the days before NoSQL, this was hard. This is how Datastax and Cassandra make it easy
In 2009, Imperva published a report on 32 million breached passwords entitled "Consumer Password Worst Practices." Since then, successive breaches have highlighted consumers' inability to make sufficient password choices. Enterprises can no longer rely on employees, partners or consumers when it comes to password security. Instead, responsibility rests on enterprises to put in place proper password security policies and procedures as a part of a comprehensive data security discipline. Passwords should be viewed by security teams as highly valuable data - even if PCI or other security mandates don't apply. This paper guides enterprises to rectify poor password management practices.
Warcbase: Building a Scalable Platform on HBase and Hadoop - Part Two, Histor...Ian Milligan
This was the second part of a joint presentation I did with Jimmy Lin (Maryland) at the "Web Archiving Collaboration: New Tools and Models" conference at Columbia University, New York NY on 4 June 2015.
Floods of Twitter Data - StampedeCon 2016StampedeCon
The Twitter data firehose delivers hundreds of millions of Tweets every day. This data flood comes with many ‘big data’ challenges in terms of both data volumes and velocities. This presentation will focus on tools that help you find your data ‘signal’ of interest, and will include several demos that focus on using Twitter for flood early-warning systems. These demos will highlight the real-time, public broadcast nature of Twitter, examples of real-time firehose filtering, as well as recent Internet of Things (IoT) Twitter integrations.
C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew...DataStax Academy
Gnip ingests and must serve out hundreds of millions of social activities every day and social platforms are only growing. This makes the scalability of applications essential for Gnip. Enter Cassandra. Problem solved, right? Not exactly, Gnip's relationship with Cassandra was not all rainbows and unicorns. In this session we will walk you through why we began looking at Cassandra as a data store in the first place and the valuable lessons we with Cassandra that has made it an invaluable part of our infrastructure.
Everything We Wish We Knew About Twitter When We Started
A look at the basics of getting started with Twitter, how to grow your following and your engagement, and how to get the most value and fun out of a truly amazing network.
We Are Social's comprehensive new report covers internet, social media and mobile usage statistics from all over the world. It contains more than 350 infographics, including global snapshots, regional overviews, and in-depth profiles of 30 of the world's largest economies. For a more insightful analysis of these numbers, please visit http://bit.ly/SDMW2015
Big Data made easy in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
Talking about the ease of use and handling Big Data technologies in the Cloud. Using Google Cloud Platform and Amazon Web Services and all of the tools around it.
Showing the problems and how we can solve them with simple tools.
Big Data to SMART Data : Process scenario
Scenario of an implementation of a transformation process of the Data towards exploitable data and representative with treatments of the streaming, the distributed systems, the messages, the storage in an NoSQL environment, a management with an ecosystem Big Data graphic visualization of the data with the technologies:
Apache Storm, Apache Zookeeper, Apache Kafka, Apache Cassandra, Apache Spark and Data-Driven Document.
Kostas Tzoumas - Stream Processing with Apache Flink®Ververica
In this talk the basics on Apache Flink are covered: why the project exists, where it came from, what gap does it fill, how it differs from all the other stream processing projects, what is it being used for, and where is it headed. In short, streaming data is now the new trend, and for very good reasons. Most data is produced continuously, and it makes sense that it is processed and analysed continuously. Whether it is the need for more real-time products, adopting micro-services, or building continuous applications, stream processing technology offers to simplify the data infrastructure stack and reduce the latency to decisions.
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
This face to face talk about Apache Flink in Sao Paulo, Brazil is the first event of its kind in Latin America! It explains how Apache Flink 1.0 announced on March 8th, 2016 by the Apache Software Foundation (link), marks a new era of Big Data analytics and in particular Real-Time streaming analytics. The talk maps Flink's capabilities to real-world use cases that span multiples verticals such as: Financial Services, Healthcare, Advertisement, Oil and Gas, Retail and Telecommunications.
In this talk, you learn more about:
1. What is Apache Flink Stack?
2. Batch vs. Streaming Analytics
3. Key Differentiators of Apache Flink for Streaming Analytics
4. Real-World Use Cases with Flink for Streaming Analytics
5. Who is using Flink?
6. Where do you go from here?
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
Tinder’s Quickfire Pipeline powers all things data at Tinder. It was originally built using AWS Kinesis Firehoses and has since been extended to use both Kafka and other event buses. It is the core of Tinder’s data infrastructure. This rich data flow of both client and backend data has been extended to service a variety of needs at Tinder, including Experimentation, ML, CRM, and Observability, allowing backend developers easier access to shared client side data. We perform this using many systems, including Kafka, Spark, Flink, Kubernetes, and Prometheus. Many of Tinder’s systems were natively designed in an RPC first architecture.
Things we’ll discuss decoupling your system at scale via event-driven architectures include:
– Powering ML, backend, observability, and analytical applications at scale, including an end to end walk through of our processes that allow non-programmers to write and deploy event-driven data flows.
– Show end to end the usage of dynamic event processing that creates other stream processes, via a dynamic control plane topology pattern and broadcasted state pattern
– How to manage the unavailability of cached data that would normally come from repeated API calls for data that’s being backfilled into Kafka, all online! (and why this is not necessarily a “good” idea)
– Integrating common OSS frameworks and libraries like Kafka Streams, Flink, Spark and friends to encourage the best design patterns for developers coming from traditional service oriented architectures, including pitfalls and lessons learned along the way.
– Why and how to avoid overloading microservices with excessive RPC calls from event-driven streaming systems
– Best practices in common data flow patterns, such as shared state via RocksDB + Kafka Streams as well as the complementary tools in the Apache Ecosystem.
– The simplicity and power of streaming SQL with microservices
Social Security Company Nexgate's Success Relies on Apache CassandraDataStax Academy
The accuracy of any security product is directly tied to the breadth of the corpus of data upon which it is built. For Nexgate, this means that the success of our products is inextricably tied to our ability to save everything we've ever scanned forever, but in a way that is still readily accessible. In the days before NoSQL, this was hard. This is how Datastax and Cassandra make it easy
In 2009, Imperva published a report on 32 million breached passwords entitled "Consumer Password Worst Practices." Since then, successive breaches have highlighted consumers' inability to make sufficient password choices. Enterprises can no longer rely on employees, partners or consumers when it comes to password security. Instead, responsibility rests on enterprises to put in place proper password security policies and procedures as a part of a comprehensive data security discipline. Passwords should be viewed by security teams as highly valuable data - even if PCI or other security mandates don't apply. This paper guides enterprises to rectify poor password management practices.
Warcbase: Building a Scalable Platform on HBase and Hadoop - Part Two, Histor...Ian Milligan
This was the second part of a joint presentation I did with Jimmy Lin (Maryland) at the "Web Archiving Collaboration: New Tools and Models" conference at Columbia University, New York NY on 4 June 2015.
Floods of Twitter Data - StampedeCon 2016StampedeCon
The Twitter data firehose delivers hundreds of millions of Tweets every day. This data flood comes with many ‘big data’ challenges in terms of both data volumes and velocities. This presentation will focus on tools that help you find your data ‘signal’ of interest, and will include several demos that focus on using Twitter for flood early-warning systems. These demos will highlight the real-time, public broadcast nature of Twitter, examples of real-time firehose filtering, as well as recent Internet of Things (IoT) Twitter integrations.
C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew...DataStax Academy
Gnip ingests and must serve out hundreds of millions of social activities every day and social platforms are only growing. This makes the scalability of applications essential for Gnip. Enter Cassandra. Problem solved, right? Not exactly, Gnip's relationship with Cassandra was not all rainbows and unicorns. In this session we will walk you through why we began looking at Cassandra as a data store in the first place and the valuable lessons we with Cassandra that has made it an invaluable part of our infrastructure.
Everything We Wish We Knew About Twitter When We Started
A look at the basics of getting started with Twitter, how to grow your following and your engagement, and how to get the most value and fun out of a truly amazing network.
We Are Social's comprehensive new report covers internet, social media and mobile usage statistics from all over the world. It contains more than 350 infographics, including global snapshots, regional overviews, and in-depth profiles of 30 of the world's largest economies. For a more insightful analysis of these numbers, please visit http://bit.ly/SDMW2015
Big Data made easy in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
Talking about the ease of use and handling Big Data technologies in the Cloud. Using Google Cloud Platform and Amazon Web Services and all of the tools around it.
Showing the problems and how we can solve them with simple tools.
Big Data to SMART Data : Process scenario
Scenario of an implementation of a transformation process of the Data towards exploitable data and representative with treatments of the streaming, the distributed systems, the messages, the storage in an NoSQL environment, a management with an ecosystem Big Data graphic visualization of the data with the technologies:
Apache Storm, Apache Zookeeper, Apache Kafka, Apache Cassandra, Apache Spark and Data-Driven Document.
Kostas Tzoumas - Stream Processing with Apache Flink®Ververica
In this talk the basics on Apache Flink are covered: why the project exists, where it came from, what gap does it fill, how it differs from all the other stream processing projects, what is it being used for, and where is it headed. In short, streaming data is now the new trend, and for very good reasons. Most data is produced continuously, and it makes sense that it is processed and analysed continuously. Whether it is the need for more real-time products, adopting micro-services, or building continuous applications, stream processing technology offers to simplify the data infrastructure stack and reduce the latency to decisions.
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
This face to face talk about Apache Flink in Sao Paulo, Brazil is the first event of its kind in Latin America! It explains how Apache Flink 1.0 announced on March 8th, 2016 by the Apache Software Foundation (link), marks a new era of Big Data analytics and in particular Real-Time streaming analytics. The talk maps Flink's capabilities to real-world use cases that span multiples verticals such as: Financial Services, Healthcare, Advertisement, Oil and Gas, Retail and Telecommunications.
In this talk, you learn more about:
1. What is Apache Flink Stack?
2. Batch vs. Streaming Analytics
3. Key Differentiators of Apache Flink for Streaming Analytics
4. Real-World Use Cases with Flink for Streaming Analytics
5. Who is using Flink?
6. Where do you go from here?
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
Tinder’s Quickfire Pipeline powers all things data at Tinder. It was originally built using AWS Kinesis Firehoses and has since been extended to use both Kafka and other event buses. It is the core of Tinder’s data infrastructure. This rich data flow of both client and backend data has been extended to service a variety of needs at Tinder, including Experimentation, ML, CRM, and Observability, allowing backend developers easier access to shared client side data. We perform this using many systems, including Kafka, Spark, Flink, Kubernetes, and Prometheus. Many of Tinder’s systems were natively designed in an RPC first architecture.
Things we’ll discuss decoupling your system at scale via event-driven architectures include:
– Powering ML, backend, observability, and analytical applications at scale, including an end to end walk through of our processes that allow non-programmers to write and deploy event-driven data flows.
– Show end to end the usage of dynamic event processing that creates other stream processes, via a dynamic control plane topology pattern and broadcasted state pattern
– How to manage the unavailability of cached data that would normally come from repeated API calls for data that’s being backfilled into Kafka, all online! (and why this is not necessarily a “good” idea)
– Integrating common OSS frameworks and libraries like Kafka Streams, Flink, Spark and friends to encourage the best design patterns for developers coming from traditional service oriented architectures, including pitfalls and lessons learned along the way.
– Why and how to avoid overloading microservices with excessive RPC calls from event-driven streaming systems
– Best practices in common data flow patterns, such as shared state via RocksDB + Kafka Streams as well as the complementary tools in the Apache Ecosystem.
– The simplicity and power of streaming SQL with microservices
Streaming data presents new challenges for statistics and machine learning on extremely large data sets. Tools such as Apache Storm, a stream processing framework, can power range of data analytics but lack advanced statistical capabilities. These slides are from the Apache.con talk, which discussed developing streaming algorithms with the flexibility of both Storm and R, a statistical programming language.
At the talk I dicsussed issues of why and how to use Storm and R to develop streaming algorithms; in particular I focused on:
• Streaming algorithms
• Online machine learning algorithms
• Use cases showing how to process hundreds of millions of events a day in (near) real time
See: https://apacheconna2015.sched.org/event/09f5a1cc372860b008bce09e15a034c4#.VUf7wxOUd5o
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the event streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and also used to be called Complex Event Processing (CEP). In the last 3 years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Apache Samza as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Event and Stream Processing and present what differences you might find between the more traditional CEP and the more modern Stream Processing solutions and show that a combination of both will bring the most value.
Building a fraud detection application using the tools in the Hadoop ecosystem. Presentation given by authors of O'Reilly's Hadoop Application Architectures book at Strata + Hadoop World in San Jose, CA 2016.
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
2017 StrataHadoop SJC conference talk. https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56047
Description:
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #DataScienceHappiness.
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #datasciencehappiness.
Data Gloveboxes: A Philosophy of Data Science Data SecurityDataWorks Summit
Data Scientists often have access to very sensitive material: data! Today's data scientists need a way to interact with toxic data where spilling more than a few data could be destructive to a company. Securing compute clusters to be like nuclear glove boxes of old is one technique to limit data exfiltration and ensure data production is regularized, reliable and secure.
This talk will cover the philosophy and implementation of:
Data Dropbox: data goes in blindly but can be verified via checksums - data directionality is enforced; using HDFS is a model and the state of HBase is discussed.
Data Glovebox: one can manipulate data as desired but can not exfiltrate except via very specific, controlled processes; the Oozie Git action is a step in this direction.
Managing your black friday logs Voxxed LuxembourgDavid Pilato
Surveiller une application complexe n'est pas une tâche aisée, mais avec les bons outils, ce n'est pas si sorcier. Néanmoins, des périodes fortes telles que les opérations de type "Black Friday" (Vendredi noir) ou période de Noël peuvent pousser votre application aux limites de ce qu'elle peut supporter, ou pire, la faire crasher. Parce que le système est fortement sollicité, il génère encore davantage de logs qui peuvent également mettre à mal votre système de supervision.
Dans cette session, j'aborderai les bonnes pratiques d'utilisation de la suite Elastic pour centraliser et monitorer vos logs. Je partagerai également avec vous quelques trucs et astuces pour vous aider à passer sans souci vos Vendredis noirs !
Nous verrons :
* Les architectures de monitoring
* Trouver la taille optimale pour l'API _bulk
* Distribuer la charge
* Taille des index et des shards
* Optimiser les E/S disque
Vous ressortirez de la session avec : des bonnes pratiques pour bâtir son système de monitoring avec la suite Elastic, le tuning avancé pour optimiser les performances d'ingestion et de recherche.
Similar to Filtering From the Firehose: Real Time Social Media Streaming (20)
Product owners and app developers are frequently tasked with designing API integrations between their apps with the cloud services used by their customers, partners or employees. Follow this 10-step guide, reflecting a common pattern for interactive integrations between your app and other cloud services
Learn why (and how) leading SaaS providers are turning their products into platforms with the power of API integration. Innovative companies, such as PactSafe, Slack and Intercom, are making integration easier and accessible by shifting the burden of integration off of their customers.
The State of API Integration Report 2017 helps to address the proliferation challenge by providing trends, insights on ease of integration, data on where the industry is strong, and where it is going next. The data presented here comes from the Cloud Elements platform of API integrations with research provided by ProgrammableWeb and Datanyze, as well as industry experts. It will help all developers navigate the recent explosion of APIs and the implications of API integrations to work more efficiently in 2017 and beyond.
Cloud Elements | State of API Integration Report 2018Cloud Elements
The State of API Integration 2018 Report contains a full breakdown on the current state of the API industry, a look at what’s trending and why, and a look ahead to where we believe API integration is headed. This year’s report builds on observations from 2017, with the help of over 400 API enthusiasts who took the State of API Integration Survey at the end of last year.
Building Event Driven API Services Using WebhooksCloud Elements
Presented at 'All Things API' in Denver, CO by Travis McChesney, Director of Engineering at Cloud Elements.
How do you build and use user defined callback URLs (known as Webhooks) to notify your users of events that occurred on your system? Or use those URLs to get remote notification from API connected systems you use?
Using Webhooks is becoming more common as APIs become essential to all programming models. We will cover four common usage models: API capture, TCP Tunneling, Dynamic DNS and Remote Development.
Mark Geene, CEO/Co-founder of Cloud Elements, presented "Lean Product Development" at Fort Collins Startup Week 2014. Check out the presentation for information on how to build a Lean startup. Based on principles from 'Lean Startup' by Eric Ries, 'Running Lean' by Ash Maurya and '500 Startups' by Dave McClure.
'Scalable Logging and Analytics with LogStash'Cloud Elements
Rich Viet, Principal Engineer at Cloud Elements presents 'Scalable Logging and Analytics with LogStash' at All Things API meetup in Denver, CO.
Learn more about scalable logging and analytics using LogStash. This will be an overview of logstash components, including getting started, indexing, storing and getting information from logs.
Logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching).
The Entrepreneurial Methodology: How engineers can harness the madness in a n...Cloud Elements
All Things API meetup @ Galvanize, Denver CO
The Entrepreneurial Methodology:
How engineers can harness the madness in a non-technical start-up
Speaker: Brandon Vogt: VP Enterprise Architecture, Inspirato
• How to be proud of your code today, and into the future
• Getting 80% of the core technology right with 20% of the requirements
• Luck vs. strategy
• Iterative development and constant communication
• Pride – It commeth before a fall; It’s not personal – it’s business
• Requirements – Change daily; thoughts while shaving; Founder’s prerogative
• People – Experienced in the Business Concept or Industry; learn from previous failures or successes
• Quick Turns – Fail Fast
• Communication – Daily, Hourly – Sit close together; know each other
• Priorities – Strategic, Weekly, Daily
• User design first vs. A perfect database design
• MINDCRAFT and CODERCRAFT - how do you get a team of master carpenter’s to build an outhouse
• We live in a distributed world with niche software options; “The Daddy of Them All” and “This isn’t your daddy’s rodeo”.
The Cloud Elements Documents Hub is the first API that unifies Document Management across the industry’s leading cloud document and file storage services.
When provisioned via Cloud Elements, Box, Dropbox, Google Drive, Microsoft SharePoint and SkyDrive automatically plug in with our enterprise-class monitoring and logging console - providing real-time visibility into the performance and availability of these services.
Cloud Elements’ “one-to-many” approach allows you write to one API and connect to all the leading services in the Documents Hub. A uniform API provides the ability to search, store, retrieve and manage documents and files across leading services.
Our Elements support your multi-tenant application. One Element manages connections with an unlimited number of “instances” of each service. So you can have thousands of Google Drive or Box accounts connecting with your application.
Data normalization across API interactionsCloud Elements
With Vineet Joshi, CTO and Co-founder of Cloud Elements
Vineet discusses the normalization of data and similar domain models so it can uniformly act with end points that consume data of the same types.
Doing this declaratively instead of programmatically, the benefit is that once you have declared the transformation configuration of a given type of data, interaction between different endpoints is possible for the same type of data in endpoint specific formats.
Lean Product Development for Startups- Denver Startup Week Cloud Elements
Mark Geene, CEO/Co-founder of Cloud Elements presented "Lean Product Development for Startups" at Denver Startup Week 2013. Check out the presentation for information on how to build a Lean startup. Based on principles from 'Lean Startup' by Eric Ries, 'Running Lean' by Ash Maurya and '500 Startups' by Dave McClure.
All Things Cloud Meetup with John Henning, Technical Evangelist at Salesforce.com
"AppExchange for Developers: Monetizing Enterprise Apps on the Salesforce.com Platform"
Learn about the huge opportunity for creating a software business building and selling business apps using the Salesforce.com platform as a service. Includes Force.com technology, platform for business apps and the ISV program.
Cloud Elements CEO, Mark Geene's presentation for Startup Founder 101 event. July 9, 2013 at Galvanize Denver, CO. Lean product management principles, Startup Metrics for Pirates, Agile MVP planning and using Pivotal Tracker.
Money & Bitcoin & the Cloud: It's all just data streams, anyway!Cloud Elements
Michael Schonfeld, developer evangelist at Dwolla. Money and the Cloud: It's all just data streams, anyway. Moving money should be just as easy as moving bytes. So why do we keep struggling with 40 year old COBOL-written financial systems? Could this be why Bitcoins have picked up all this momentum? Come take an inside look at how Dwolla is changing a long-stagnated banking industry that is governed by ancient misguided notions of profit.
API Versioning in the Cloud. Presented to the Galvanize gSchool on 6/14/2013 by Travis McChesney, Senior Engineer- Cloud Elements. Content Negotiation, URI, URI Parameter, Cloud Elements Demo.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Filtering From the Firehose: Real Time Social Media Streaming
1. Filtering from the Firehose !
Real-time streaming of social network data!
!
!
Jim Moffitt – Developer Advocate @gnip
@jimmoffitt
2. Who is this guy and what is he going to talk about?
• Introduc)on
• Social
media
firehoses
• Data
sources
• Use-‐cases
• Needle
in
the
haystack
• Filtering
from
the
firehose
• Example
use-‐case
• Server-‐side
• Apache
KaCa
• Apache
Cassandra
• Client-‐side
• HTTP
streaming
code
examples
• Live
streaming
and
search
3. What is a firehose?
•
Con)nuous
stream
of
flexibly
structured
(JSON)
social
media
ac)vi)es
in
near-‐real
)me.
•
Poten)ally
extreme
amounts
of
data.
5. Accessing Social Data for Analytics:!
Crawling/Scraping!
Licensed Access: !
Publisher provides
data “firehose”!
It’s Free!
Open Access!
No rate limits,
compliant,
reliable!
Rate limits, not
guaranteed!
TOS issues,
high latency,
fragile!
Financial
investment, not
all publishers
are covered!
Public API’s!
Pros
Cons
6. Example firehose volumes
Publisher
Daily
Ac0vity
TwiQer
450
M
Tumblr
96
M
+
54
M
votes
Foursquare
4.3
M
Disqus
1.9
M
Wordpress
Comments
1.4
M
Wordpress
Posts
0.6
M
GetGlue
0.6
M
7. Daily Tweet Activity Count
2006
5k
4k
3k
2k
1k
0
2007
200 k
100 k
0
Tweets/Day
2008
1.6 M
1.2 M
800.0 k
400.0 k
2009
25 M
20 M
15 M
10 M
5M
2010
80 M
60 M
40 M
20 M
2011
250 M
200 M
150 M
100 M
Jan
Feb
Mar
Apr
May
Jun
Jul
Date
Aug
Sep
Oct
Nov
Dec
Jan
8. Use-cases for Social Media Analysis
•
•
•
•
•
•
Sales
&
Marke)ng
Brand
monitoring
Customer
Service
Public
Rela)ons
Emergency
Response
All
kinds
of
academic
research…
9. So you are building something around social media?
Some
business
considera)ons:
• Objec)ve
–
what
are
the
ques)ons
that
you
are
trying
to
answer?
•
Timeframe
–
real-‐)me
or
historical
use-‐case
(or
both)?
•
Coverage
–
do
I
need
all
the
data
or
some
sta)s)cal
sample?
• Licensing
and
Terms
of
Service
• Budgets
• Data
costs.
• Sofware
development.
• Infrastructure
(bandwidth,
servers,
storage).
10. So you are building something around social media?
Some
technical
considera)ons:
• Data
transfer
protocols:
RESTful
or
‘keep-‐alive’
Streaming?
• What
sofware
language?
• Bandwidth:
what
does
your
peak
volume
need
to
be?
• Data
storage
• How
and
where
are
you
storing
the
data?
• What
metadata
do
you
need
to
store?*
• Redundant
streams?
11. What data comes with a tweet?
{"id":"tag:search.twiQer.com,2005:388326436685103105","objectType":"ac)vity","actor":{"objectType":"person","id":"id:twiQer.com:
17200003","link":"hQp://www.twiQer.com/jimmoffiQ","displayName":"jimmoffiQ","postedTime":"2008-‐11-‐05T23:06:37.000Z","image":"hQps://
si0.twimg.com/profile_images/3678478654/6aac91cc6bd5711b82c83ebab0a55de0_normal.jpeg","summary":"Once
studied
snow
hydrology.
Recently
developed
real-‐)me
weather
monitoring
and
flood
warning
sofware.
Have
started
a
new
adventure
at
an
amazing
company...","links":
[{"href":null,"rel":"me"}],"friendsCount":69,"followersCount":71,"listedCount":1,"statusesCount":189,"twiQerTimeZone":"Mountain
Time
(US
&
Canada)","verified":false,"utcOffset":"-‐21600","preferredUsername":"jimmoffiQ","languages":["en"],"loca)on":
{"objectType":"place","displayName":"Longmont,
Colorado"},"favoritesCount":17},"verb":"post","postedTime":"2013-‐10-‐10T15:33:31.000Z","generator":
{"displayName":"TweetDeck","link":"hQp://www.tweetdeck.com"},"provider":{"objectType":"service","displayName":"TwiQer","link":"hQp://
www.twiQer.com"},"link":"hQp://twiQer.com/jimmoffiQ/statuses/388326436685103105","body":"Looking
forward
to
this
"All
Things
Cloud"
meet-‐up
in
Denver
next
Tuesday
10/15
hGp://t.co/EQSCWMW4hL
@gnip","object":{"objectType":"note","id":"object:search.twiQer.com,
2005:388326436685103105","summary":"Looking
forward
to
this
"All
Things
Cloud"
meet-‐up
in
Denver
next
Tuesday
10/15
hQp://t.co/EQSCWMW4hL
@gnip","link":"hQp://twiQer.com/jimmoffiQ/statuses/388326436685103105","postedTime":"2013-‐10-‐10T15:33:31.000Z"},"favoritesCount":
0,"twiQer_en))es":{"hashtags":[],"symbols":[],"urls":[{"url":"hQp://t.co/EQSCWMW4hL","expanded_url":"hQp://meetu.ps/
1Fywpg","display_url":"meetu.ps/1Fywpg","indices":[80,102]}],"user_men)ons":[{"screen_name":"gnip","name":"Gnip,
Inc.","id":
16958875,"id_str":"16958875","indices":[103,108]}]},"twiQer_filter_level":"medium","twiQer_lang":"en","retweetCount":0,"gnip":{"matching_rules":
[{"value":""All
Things
Cloud"","tag":null},{"value":"from:jimmoffiQ","tag":null}],"urls":[{"url":"hQp://t.co/EQSCWMW4hL","expanded_url":"hQp://
www.meetup.com/All-‐things-‐Cloud-‐PaaS-‐SaaS-‐PaaS-‐XaaS/events/124584092/"}],"klout_score":49,"klout_profile":{"topics":
[{"klout_topic_id":"10000000000000000020","displayName":"Tablets","link":"hQp://klout.com/topic/id/
10000000000000000020"}],"klout_user_id":"26177177599171892","link":"hQp://klout.com/user/id/26177177599171892"},"language":
{"value":"en"},"profileLoca)ons":[{"objectType":"place","geo":{"type":"point","coordinates":[-‐105.10193,40.16721]},"address":{"country":"United
States","countryCode":"US","locality":"Longmont","region":"Colorado"},"displayName":"Longmont,
Colorado,
United
States"}]}}
12. Methods for filtering data
• Token
filter
(e.g.
"pizza",
"beer"
)
• Substrings
(contains:sport)
• Exact
phrases
("all
things
cloud”)
• Operators:
metadata
(geo,
language,
profiles,
account
stats,
...
)
• Operators:
sampling
(e.g.
sample:10%)
• Publisher-‐specific
Operators:
hashtags,
user
men)ons/from/to,
retweets,
...
Examples:
(pizza
beer)
"all
things
cloud"
profile_region:colorado
twins
(baseball
OR
minnesota
OR
sports
OR
“small
market”)
–(cute
OR
baby
OR
olsen
OR
olson)
13. !
Example use-case: Early-warning systems
Is
there
a
TwiQer
‘signal’
around
local
rain
and
flood
events?
Business
logic:
rain
OR
raining
OR
rained
OR
pouring
OR
weather
OR
hail
OR
lightning
OR
contains:flood
OR
"cats
and
dogs"
OR
wxreport
OR
contains:storm
OR
contains:precip
See
h
Qp://blog.gnip.com/twee)ng-‐in-‐the-‐rain
Parts
1,
2
&
3
14. Social media and early-warning systems
There
are
generally
three
methods
for
geo-‐referencing
TwiQer
data:
• Ac)vity
Loca)on:
tweets
that
are
geo-‐tagged.
• Men)oned
Loca)on:
parsing
the
tweet
message
for
geographic
loca)on.
• Profile
Loca)on:
parsing
the
TwiQer
Account
Profile
loca)on
provided
by
the
user.
• User
account
profile:
82%
• Tweet
text:
17%
• Tweet
geo-‐tagging:
1%
See
hQp://blog.gnip.com/twee)ng-‐in-‐the-‐rain
Parts
1,
2
&
3
15. Social media and early-warning systems
• Profile
Loca)on
(old):
• bio_loca)on_contains:louisville
-‐(bio_loca)on_contains:"co
"
OR
bio_loca)on_contains:colorado)
-‐(bio_loca)on_contains:"tn
"
OR
bio_loca)on_contains:tennessee)
• Profile
Loca)on
(new):
• profile_locality:louisville
profile_region:kentucky
See
hQp://blog.gnip.com/twee)ng-‐in-‐the-‐rain
Parts
1,
2
&
3
16. Social media and early-warning systems
See
hQp://blog.gnip.com/twee)ng-‐in-‐the-‐rain
Parts
1,
2
&
3
17. Social media and early-warning systems
See
hQp://blog.gnip.com/twee)ng-‐in-‐the-‐rain
Parts
1,
2
&
3
18. Apache Kafka @ Gnip
KaCa
is
used
to
help
manage
streaming
traffic
with
the
outside
world.
First
applica)on
was
with
outbound
streams
Gnip
à
Customer
Helps
provide
a
“on-‐disk”
buffer
for
client
streams.
Write
data
to
disk
for
a
short
period.
If
client
disconnects,
when
they
reconnect
their
data
buffer
is
“backfilled.”
19. Apache Kafka @ Gnip
Next
applied
to
inbound
Publisher
streams
Publisher
à
Gnip
Buffers
incoming
data
and
helps
manage
massive
volume
spikes.
Spikes
are
isolated
to
this
ingest
)er.
Downstream
applica)ons
read
data
as
fast
as
they
can.
20. Apache Cassandra @ Gnip!
Serves
a
moving
window
of
TwiQer
day
(currently
30
days).
Will
grow.
Chosen
for
its
• Write-‐speeds
• Reliability
• Redundancy
• Scalability
21. Apache Cassandra @ Gnip!
• Serves
a
variety
of
data
services,
products
and
use-‐cases.
• For
Search
we
have
an
Apache
Lucene
index
helping
to
quickly
find
Cassandra
data.
• Nearly
50
Cassandra
servers
across
test/staging/produc)on
environments.
22. Streaming social media
curl
-‐ujmoffiQ@gnipcentral.com
hQps://api.gnip.com:443/accounts/jim/publishers/twiQer/
streams/track/dev/rules.json
curl
-‐v
-‐X
POST
-‐ujmoffiQ@gnipcentral.com
"hQps://api.gnip.com:443/accounts/jim/publishers/twiQer/streams/track/dev/rules.json"
-‐d
'{"rules":[{"tag":"demo","value":"weather
OR
rain
OR
snow"}]}'
curl
-‐-‐compressed
-‐v
-‐ujmoffiQ@gnipcentral.com
"hQps://stream.gnip.com:443/accounts/jim/publishers/twiQer/streams/track/dev.json"
23. Code examples
Search
GitHub
for
“TwiQer
Stream”
Python
Streaming
Connec)on
We've
found
793
repository
results
HERE
Ruby
Streaming
Connec)on
(using
‘curb’
libcurl
gem)
HERE
Ruby
Streaming
Connec)on
(using
EventMachine
gem)
HERE