Top 10 Data Parallelism and Model Parallelism lessons from scaling H2O.
"Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ever growing demand for sophisticated algorithms over big data. In this talk, we get a ringside view in the making of the world's most scalable and fastest machine learning framework, H2O, and the performance lessons learnt scaling it over EC2 for Netflix and over commodity hardware for other power users.
Top 10 Performance Gotchas is about the white hot stories of i/o wars, S3 resets, and muxers, as well as the power of primitive byte arrays, non-blocking structures, and fork/join queues. Of good data distribution & fine-grain decomposition of Algorithms to fine-grain blocks of parallel computation. It's a 10-point story of the rage of a network of machines against the tyranny of Amdahl while keeping the statistical properties of the data and accuracy of the algorithm."
Compendium of my Brisk, Cassandra & Hadoop talks of the Summer 2011 - Delivered at JavaOne2011. I like the content in this one personally as it touches, Usecase driven intro to Cassandra, NoSQL followed by Intro to hadoop - MapReduce, HDFS internals, NameNode and JobTrackers. And how Brisk decomposes the Single point of failures in HDFS while providing a single form for Realtime & Batch storage and processing.
(And it seemed enjoyable to the audience in attendance)
SFJava, SFNoSQL, SFMySQL, Marakana & Microsoft come together for a presentation evening of three NoSQL technologies - Apache Cassandra, Mongodb, Hadoop.
This talk lays out a few talking points for Apache Cassandra.
Brisk - Truly peer-to-peer hadoop.
Brisk is an open-source Hadoop & Hive distribution that uses Apache Cassandra for its core services and storage. Brisk makes it possible to run Hadoop MapReduce on top of CassandraFS, an HDFS-compatible storage layer. By replacing HDFS with CassandraFS, users leverage MapReduce jobs on Cassandra’s peer-to-peer, fault-tolerant and scalable architecture.
With CassandraFS all nodes are peers. Data files can be loaded through any node in the cluster and any node can serve as the JobTracker for MapReduce jobs. Hive MetaStore is stored & accessed as just another column family (table) on the distributed data store. Brisk makes Hadoop truly peer-to-peer.
We demonstrate visualisation & monitoring of Brisk using OpsCenter. The operational simplicity of cassandra’s multi-datacenter & multi-region aware replication makes Brisk well-suited for a rich set of Applications and usecases. And by being able to store and isolate hdfs & online data within the same data cluster, Brisk makes analytics possible without ETL!
LA Scalability Talk, Mahalo
May 31.2011
Compendium of my Brisk, Cassandra & Hadoop talks of the Summer 2011 - Delivered at JavaOne2011. I like the content in this one personally as it touches, Usecase driven intro to Cassandra, NoSQL followed by Intro to hadoop - MapReduce, HDFS internals, NameNode and JobTrackers. And how Brisk decomposes the Single point of failures in HDFS while providing a single form for Realtime & Batch storage and processing.
(And it seemed enjoyable to the audience in attendance)
SFJava, SFNoSQL, SFMySQL, Marakana & Microsoft come together for a presentation evening of three NoSQL technologies - Apache Cassandra, Mongodb, Hadoop.
This talk lays out a few talking points for Apache Cassandra.
Brisk - Truly peer-to-peer hadoop.
Brisk is an open-source Hadoop & Hive distribution that uses Apache Cassandra for its core services and storage. Brisk makes it possible to run Hadoop MapReduce on top of CassandraFS, an HDFS-compatible storage layer. By replacing HDFS with CassandraFS, users leverage MapReduce jobs on Cassandra’s peer-to-peer, fault-tolerant and scalable architecture.
With CassandraFS all nodes are peers. Data files can be loaded through any node in the cluster and any node can serve as the JobTracker for MapReduce jobs. Hive MetaStore is stored & accessed as just another column family (table) on the distributed data store. Brisk makes Hadoop truly peer-to-peer.
We demonstrate visualisation & monitoring of Brisk using OpsCenter. The operational simplicity of cassandra’s multi-datacenter & multi-region aware replication makes Brisk well-suited for a rich set of Applications and usecases. And by being able to store and isolate hdfs & online data within the same data cluster, Brisk makes analytics possible without ETL!
LA Scalability Talk, Mahalo
May 31.2011
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
Internet of Things (IoT) data frequently has a location and time component. Getting value out of this "geotemporal" data can be tricky. We'll explore when and how to leverage Cassandra, DSE Search and DSE Analytics to surface meaningful information from your geotemporal data.
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...DataStax
We built an application based on the principles of CQRS and Event Sourcing using Cassandra and Spark. During the project we encountered a number of challenges and problems with Cassandra and the Spark Connector.
In this talk we want to outline a few of those problems and our actions to solve them. While some problems are specific to CQRS and Event Sourcing applications most of them are use case independent.
About the Speakers
Matthias Niehoff IT-Consultant, codecentric AG
works as an IT-Consultant at codecentric AG in Germany. His focus is on big data & streaming applications with Apache Cassandra & Apache Spark. Yet he does not lose track of other tools in the area of big data. Matthias shares his experiences on conferences, meetups and usergroups.
Stephan Kepser Senior IT Consultant and Data Architect, codecentric AG
Dr. Stephan Kepser is an expert on cloud computing and big data. He wrote a couple of journal articles and blog posts on subjects of both fields. His interests reach from legal questions to questions of architecture and design of cloud computing and big data systems to technical details of NoSQL databases.
Presented to the Dublin Cassandra User Group by Niall Milton of DigBigData. This presentation is on Cassandra and its use with other technologies such as Storm, Spark, Hadoop, ElasticSearch and Redis. This presentation should act as a solid foundation to explore some of the mentioned technologies in more depth.
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Data Con LA
After a brief technical introduction to Apache Cassandra we'll then go into the exciting world of Apache Spark integration, and learn how you can turn your transactional datastore into an analytics platform. Apache Spark has taken the Hadoop world by storm (no pun intended!), and is widely seen as the replacement to Hadoop Map Reduce. Apache Spark coupled with Cassandra are perfect allies, Cassandra does the distributed data storage, Spark does the distributed computation.
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...DataStax
With the addition of vnodes (Virtual Nodes), Cassandra users were able to gain a few benefits as a result of streaming when it came to bootstrapping and decommissioning nodes. On the flip side, having to route requests on larger clusters became a lot more intensive of a workload for all nodes that were then forced to act coordinator nodes. By setting up a tier of proxy nodes, we were able to have our cluster of 50 nodes perform with a 300% improvement on average in a mixed workload environment. This is an explanation of what we did, how we did it, and why it works.
About the Speaker
Eric Lubow CTO, SimpleReach
Eric Lubow is CTO of SimpleReach, where he builds highly-scalable distributed systems for processing analytics data. Eric is also a DataStax MVP for Cassandra, and co-author of Practical Cassandra. In his spare time, Eric is a skydiver, motorcycle rider, mixed martial artist, and dog dad.
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...DataStax
Cassandra is getting more and more buzz and that means two things, more development and more issues. Some issues are unavoidable, but some of them are, just by understanding how our tooling works.
In this talk I'd like to review the core concepts on which Cassandra is built and how they impose the way we should work with it using some examples that will hopefully give you both a 'Quick Reference' and a 'Checklist' to go through every time you want to build scalable data models.
About the Speaker
Carlos Alonso Software Engineer, Job and Talent
Carlos received his Masters CS at Salamanca University, Spain. He worked a few years there in a digital agency, gaining expertise on a very wide range of technologies before moving to London where he narrowed down the focus on to the backend and data engineering disciplines. The latest step in his professional career was to move back to Madrid to work for Job and Talent where he currently helps on building the best candidate-job opening matching technology. Aside from work he likes sharing as much as he can by public speaking, mentoring or getting involved in OSS or OpenData initiatives.
Over the past year data at GumGum has quadrupled. Now a days we process 20 TB of new data every day. New data/reporting requirements pour in every week. Usage of the real time data is growing with the daily data. In this talk we are tying to answer the following questions: How do we serve real time data with daily batched data to our consumers together? What is Lambda Architecture and how does it help? What role does Cassandra play in Lambda Architecture at GumGum? How did we solve few bottlenecks in the architecture using Cassandra? How Cassandra can help you avoid microbatching and give you a true realtime data?
About the Speaker
Vaibhav Puranik VP of Engineering, Big Data & Platform, GumGum
Vaibhav has 15 years of experience in Software. He began his career with Johnson Space Center in Houston and has a masters degree in computer science. For past 5 years Vaibhav has been responsible for architecting multiple big data systems at GumGum. He manages Data Science, Data Engineering and DevOps teams at GumGum.
In this slidecast, SriSatish Ambati from 0xdata describes the company's new H20 Open Source, In-memory Machine Learning application for Big Data.
"We developed H2O to unlock the predictive power of big data through better algorithms," said SriSatish Ambati, CEO and co-founder of 0xdata. "H2O is simple, extensible and easy to use and deploy from R, Excel and Hadoop. The big data science world is one of algorithm-haves and have-nots. Amazon, Goldman Sachs, Google and Netflix have proven the power of algorithms on data. With our viral and open Apache software license philosophy, along with close ties into the math, Hadoop and R communities, we bring the power of Google-scale machine learning and modeling without sampling to the rest of the world."
Watch the presentation video: http://wp.me/p3RLEV-1xc
Learn more: http://0xdata.com
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
Internet of Things (IoT) data frequently has a location and time component. Getting value out of this "geotemporal" data can be tricky. We'll explore when and how to leverage Cassandra, DSE Search and DSE Analytics to surface meaningful information from your geotemporal data.
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...DataStax
We built an application based on the principles of CQRS and Event Sourcing using Cassandra and Spark. During the project we encountered a number of challenges and problems with Cassandra and the Spark Connector.
In this talk we want to outline a few of those problems and our actions to solve them. While some problems are specific to CQRS and Event Sourcing applications most of them are use case independent.
About the Speakers
Matthias Niehoff IT-Consultant, codecentric AG
works as an IT-Consultant at codecentric AG in Germany. His focus is on big data & streaming applications with Apache Cassandra & Apache Spark. Yet he does not lose track of other tools in the area of big data. Matthias shares his experiences on conferences, meetups and usergroups.
Stephan Kepser Senior IT Consultant and Data Architect, codecentric AG
Dr. Stephan Kepser is an expert on cloud computing and big data. He wrote a couple of journal articles and blog posts on subjects of both fields. His interests reach from legal questions to questions of architecture and design of cloud computing and big data systems to technical details of NoSQL databases.
Presented to the Dublin Cassandra User Group by Niall Milton of DigBigData. This presentation is on Cassandra and its use with other technologies such as Storm, Spark, Hadoop, ElasticSearch and Redis. This presentation should act as a solid foundation to explore some of the mentioned technologies in more depth.
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Data Con LA
After a brief technical introduction to Apache Cassandra we'll then go into the exciting world of Apache Spark integration, and learn how you can turn your transactional datastore into an analytics platform. Apache Spark has taken the Hadoop world by storm (no pun intended!), and is widely seen as the replacement to Hadoop Map Reduce. Apache Spark coupled with Cassandra are perfect allies, Cassandra does the distributed data storage, Spark does the distributed computation.
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...DataStax
With the addition of vnodes (Virtual Nodes), Cassandra users were able to gain a few benefits as a result of streaming when it came to bootstrapping and decommissioning nodes. On the flip side, having to route requests on larger clusters became a lot more intensive of a workload for all nodes that were then forced to act coordinator nodes. By setting up a tier of proxy nodes, we were able to have our cluster of 50 nodes perform with a 300% improvement on average in a mixed workload environment. This is an explanation of what we did, how we did it, and why it works.
About the Speaker
Eric Lubow CTO, SimpleReach
Eric Lubow is CTO of SimpleReach, where he builds highly-scalable distributed systems for processing analytics data. Eric is also a DataStax MVP for Cassandra, and co-author of Practical Cassandra. In his spare time, Eric is a skydiver, motorcycle rider, mixed martial artist, and dog dad.
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...DataStax
Cassandra is getting more and more buzz and that means two things, more development and more issues. Some issues are unavoidable, but some of them are, just by understanding how our tooling works.
In this talk I'd like to review the core concepts on which Cassandra is built and how they impose the way we should work with it using some examples that will hopefully give you both a 'Quick Reference' and a 'Checklist' to go through every time you want to build scalable data models.
About the Speaker
Carlos Alonso Software Engineer, Job and Talent
Carlos received his Masters CS at Salamanca University, Spain. He worked a few years there in a digital agency, gaining expertise on a very wide range of technologies before moving to London where he narrowed down the focus on to the backend and data engineering disciplines. The latest step in his professional career was to move back to Madrid to work for Job and Talent where he currently helps on building the best candidate-job opening matching technology. Aside from work he likes sharing as much as he can by public speaking, mentoring or getting involved in OSS or OpenData initiatives.
Over the past year data at GumGum has quadrupled. Now a days we process 20 TB of new data every day. New data/reporting requirements pour in every week. Usage of the real time data is growing with the daily data. In this talk we are tying to answer the following questions: How do we serve real time data with daily batched data to our consumers together? What is Lambda Architecture and how does it help? What role does Cassandra play in Lambda Architecture at GumGum? How did we solve few bottlenecks in the architecture using Cassandra? How Cassandra can help you avoid microbatching and give you a true realtime data?
About the Speaker
Vaibhav Puranik VP of Engineering, Big Data & Platform, GumGum
Vaibhav has 15 years of experience in Software. He began his career with Johnson Space Center in Houston and has a masters degree in computer science. For past 5 years Vaibhav has been responsible for architecting multiple big data systems at GumGum. He manages Data Science, Data Engineering and DevOps teams at GumGum.
In this slidecast, SriSatish Ambati from 0xdata describes the company's new H20 Open Source, In-memory Machine Learning application for Big Data.
"We developed H2O to unlock the predictive power of big data through better algorithms," said SriSatish Ambati, CEO and co-founder of 0xdata. "H2O is simple, extensible and easy to use and deploy from R, Excel and Hadoop. The big data science world is one of algorithm-haves and have-nots. Amazon, Goldman Sachs, Google and Netflix have proven the power of algorithms on data. With our viral and open Apache software license philosophy, along with close ties into the math, Hadoop and R communities, we bring the power of Google-scale machine learning and modeling without sampling to the rest of the world."
Watch the presentation video: http://wp.me/p3RLEV-1xc
Learn more: http://0xdata.com
The need to process huge data is increasing day by day. Processing huge data involves compute, network and storage. In terms of Big Data, What it takes to innovate and what is innovation at the end? This talk provide high level details on the need of big data and capabilities of Mapr converged data platform.
Speaker: Vijaya Saradhi Uppaluri, Technical Director at MapR Technologies
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...StampedeCon
From the StampedeCon 2015 Big Data Conference: There is an adage, “If you fail to plan, you plan to fail” . When developing systems the adage can be taken a step further, “If you fail to plan FOR FAILURE, you plan to fail”. At Huffington post data moves between a number of systems to provide statistics for our technical, business, and editorial teams. Due to the mission-critical nature of our data, considerable effort is spent building resiliency into processes.
This talk will focus on designing for failure. Some material will focus understanding the traits of specific distributed systems such as message queues or NoSQL databases and what are the consequences for different types of failures. While other parts of the presentation will focus on how systems and software can be designed to make re-processing batch data simple, or how to determine what failure mode semantics are important for a real time event processing system.
Talk to techmeetup Aberdeen on bigdata and nosql
Some links seem to be missing from the onscreen presentation, particularly http://www.dbshards.com/dbshards/ for the sharding diagram
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
Alluxio Global Online Meetup
Apr 23, 2020
For more Alluxio events: https://www.alluxio.io/events/
Speakers:
Jiao (Jennie) Wang, Intel
Tsai Louie, Intel
Bin Fan, Alluxio
Today, many people run deep learning applications with training data from separate storage such as object storage or remote data centers. This presentation will demo the Intel Analytics Zoo + Alluxio stack, an architecture that enables high performance while keeping cost and resource efficiency balanced without network being I/O bottlenecked.
Intel Analytics Zoo is a unified data analytics and AI platform open-sourced by Intel. It seamlessly unites TensorFlow, Keras, PyTorch, Spark, Flink, and Ray programs into an integrated pipeline, which can transparently scale from a laptop to large clusters to process production big data. Alluxio, as an open-source data orchestration layer, accelerates data loading and processing in Analytics Zoo deep learning applications.
This talk, we will go over:
- What is Analytics Zoo and how it works
- How to run Analytics Zoo with Alluxio in deep learning applications
- Initial performance benchmark results using the Analytics Zoo + Alluxio stack
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
Jan 22nd, 2010 Hadoop meetup presentation on project voldemort and how it plays well with Hadoop at linkedin. The talk focus on Linkedin Hadoop ecosystem. How linkedin manage complex workflows, data ETL , data storage and online serving of 100GB to TB of data.
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...confluent
Do you know who is knocking on your network’s door? Have new regulations left you scratching your head on how to a handle what is happening in your network? Network flow data helps answer many questions across a multitude of use cases including network security, performance, capacity planning, routing, operational troubleshooting and more. Today’s modern day streaming data pipelines need to include tools that can scale to meet the demands of these service providers while continuing to provide responsive answers to difficult questions. In addition to stream processing, data needs to be stored in a redundant, operationally focused database to provide fast, reliable answers to critical questions. Together, Kafka and Druid work together to create such a pipeline.
In this talk Eric Graham and Rachel Pedreschi will discuss these pipelines and cover the following topics: Network flow use cases and why this data is important. Reference architectures from production systems at a major international Bank. Why Kafka and Druid and other OSS tools for Network flows. A demo of one such system.
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...confluent
Do you know who is knocking on your network’s door? Have new regulations left you scratching your head on how to handle what is happening in your network? Network flow data helps answer many questions across a multitude of use cases including network security, performance, capacity planning, routing, operational troubleshooting and more. Today’s modern day streaming data pipelines need to include tools that can scale to meet the demands of these service providers while continuing to provide responsive answers to difficult questions. In addition to stream processing, data needs to be stored in a redundant, operationally focused database to provide fast, reliable answers to critical questions. Together, Kafka and Druid work together to create such a pipeline.
In this talk Eric Graham and Rachel Pedreschi will discuss these pipelines and cover the following topics:
-Network flow use cases and why this data is important.
-Reference architectures from production systems at a major international Bank.
-Why Kafka and Druid and other OSS tools for Network Flows.
-A demo of one such system.
Content presented at a talk on Aug. 29th. Purpose is to inform a fairly technical audience on the primary tenets of Big Data and the hadoop stack. Also, did a walk-thru' of hadoop and some of the hadoop stack i.e. Pig, Hive, Hbase.
Morten Egan gives a short introduction to Big Data and what it is all about. What is MapReduce, HDFS, Hive, Pig and HCatalog? Also, a short introduction to Hortonworks.
This presentation was made for the danish oracle user group.
SF Java presentation of jvm goes to big data.
“Slowly yet surely the JVM is going to Big Data! In this fun filled presentation we see what pieces of Java & JVM triumph or unravel in the battle for performance at high scale!”
Concurrency is the currency of scale on multi-core & the new generation of collections and non-blocking hashmaps are well worth the time taking a deep dive into. We take a quick look at the next gen serialization techniques as well as implementation pitfalls around UUID. The achilles' heel for JVM remains Garbage Collection: a deep dive into the internals of the memory model, common GC algorithms and their tuning knobs is always a big draw. EC2 & cloud present us with a virtualized & unchartered territory for scaling the JVM.
We will leave some room for Q&A or fill it up with any asynchronous I/O that might queue up during the talk. A round of applause will be due to the various tools that are essentials for Java performance debugging.
invited netflix talk: JVM issues in the age of scale! We take an under the hood look at java locking, memory model, overheads, serialization, uuid, gc tuning, CMS, ParallelGC, java.
Caching in Java - A review of different caching vendors (Oracle Coherence, Apache Cassandra, Infinispan, Ehcache/Terracotta, etc) and limitations presented by the underlying Java Platform.
Presented at RedHat Summit 2010, Boston
Speakers: SriSatish Ambati, Performance Engg
Manik Surtani, InfiniSpan Lead
Presentation details from RH Summit:
How to Stop Worrying & Start Caching in Java
SriSatish Ambati — Performance & Partner Engineer, Azul Systems, Inc.
Manik Surtani — Principal Software Engineer, Red Hat
Application data caching has come of age as distributed and large cache clusters are now common. The next generation of applications that depend on efficient caching has come into being and data and cache size explosion has set in.
In this session, Azul Systems’ SriSatish Ambati and Red Hat’s Manik Surtani will survey performance characteristics of different cache algorithms, their implementations (e.g., implementing a 200Gb data cache size), and how well they work in practical JVM deployments. In each scenario, they will present patterns of architecture that scale, and demonstrate where read and write performance stands in the context of increasing cache sizes and concurrency.
Throughout this discussion, they will recognize several villains, including heap fragmentation, long-lived objects, multi-VM communication, socket handlers, and queue managers. SriSatish and Manik will take a fun-filled “whodunit” approach to portray the roles played by each villain in killing cache performance.
http://www.redhat.com/promo/summit/2010/sessions/jboss.html
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...srisatish ambati
Top 10 Causes for Java Issues in Production and What to Do When Things Go Wrong
JavaOne 2010.
Abstract: It's Friday evening and you hear the first rumble . . . one java node has become slightly unresponsive. You lookup the process, get a thread dump, and for good measure restart it at 8 p.m. Saturday afternoon is when you realize that other nodes have caught the flu and you get the ugly call from the customer. In a matter of hours, you're on that conference bridge with support groups of different packages and Java vendors and one of your uberarchitects. Yes, production instances are up and down, and restarting like there's no tomorrow. Here's an accumulated compendium of the op 10 things that can cause Java production heartburn and what to do when your Java production is on fire. And yes, please have your tools belt on.
Speaker(s):
Cliff Click, Azul Systems, Distinguished Engineer
SriSatish Ambati, Azul Systems, Performance Engineer
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)srisatish ambati
Cache & Concurrency considerations for a high performance Cassandra deployment.
SriSatish Ambati
Cassandra has hit it's stride as a distributed java NoSQL database! It's fast, it's in-memory, it's scalable, it's seda; It's eventually consistent model makes it practical for the large & growing volumes of unstructured data usecases. It is also time to run it through the filters of performance analysis. For starters it runs on the java virtual machine and inherits the capabilities and culpabilities of the platform. This presentation reviews the runtime architecture, cache behavior & performance of a real-world workload on Cassandra. We blend existing system & jvm tools to get a quick overview & a breakdown of hotspots in the get, put & update operations. We highlight the role played by garbage collection & fragmentation due to long lived objects; We investigate lock contention in the data structures under concurrent usage. Cassandra uses UDP for management & TCP for data: we look at robustness of the communication patterns during high spikes and cluster-wide events. We review Non-Blocking Hashmap modifications to Cassandra that improve concurrency & amplify performance of this frontrunner in the NoSQL space
ApacheCon2010 NA
Wed, 03 November 2010 15:00
cassandra
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
5. Group
By
Grep
Messy
NAs
Classifica-on
Regression
Clustering
Ensembles
100’s
nanos
models
H 2O
Big Data
the
Adhoc
Explora-on
Math
Modeling
Real-‐-me
Scoring
Prediction
Engine
6. No New API!
Big
Data
Explora-on
Modeling
Scoring
Real-‐-me
H 2O
the
Prediction
Engine
Approximate!
results each step!
7. Intellectual
Legacy
Math
needs
to
be
free
Open
Source
Support and Innovation
hFps://github.com/0xdata/h2o
H 2O
the
Prediction
Engine
9. 10
Move Code not Data
Data chunks > code chunks
TCP for Data. UDP for Control.
>> Generated Java Assist
10. A Chunk, Unit of Parallel Access
A Frame: Vec[]
age
sex
zip
ID
car
JVM
1
Heap
JVM
2
Heap
JVM
3
Heap
JVM
4
Heap
Vecs aligned
in heaps
l Optimized for
concurrent access
l Random access
any row, any JVM
l
11. 9
Chunk-ing Express!
season for Variable-sized chunks
and a season Uniform chunks.
Tightly-packed!
(chunk is also unit of batch!)
12. 8
Reduce early. Reduce Often!
No Expensive intermediate states.
Fine-grain parallelism wins!
>> Fork / Join
13. 8
Reduce early. Reduce Often!
Vec
Vec
Vec
Vec
Vec
All CPUs grab
Chunks in parallel
Map/Reduce & F/J
handles all sync
JVM
1
Hea
p
JVM
2
Hea
p
JVM
3
Hea
p
JVM
4
Hea
p
14. 7
Slow is not different from Dead
Debugging slow
>> Heartbeats, Messages
Two General’s Paradox
15. 6
Memory Manager
in-memory system as good as
your memory manager!
lazy eviction.
compress.
align.
Corollary: Track down Leaks!
16. 5
Memory Overheads
Use primitives
// A Distributed Vector
//
much more than 2billion elements
class Vec {
long length(); // more than an int's worth
// fast random access
double at(long idx); // Get the idx'th elem
boolean isNA(long idx);
}
void set(long idx, double d); // writable
void append(double d); // variable sized
17. 4
Cache-‐Oblivious
Tree size
Bin size
Recursively divide
Till Data à Cache
18. 3
EC2 – Nothing is bounded
User-mode reliability
S3 Readers will TCP Reset
Mux your connections
Not all toolkits are equal.
>> JetS3
19. 2 No Locks, No Cry
Non-Blocking Data Structures.
// VOLATILE READ before key compare.
// CAS
private final boolean CAS_kvs( final Object[]
oldkvs, final Object[] newkvs ) {
return _unsafe.compareAndSwapObject(this,
_kvs_offset, oldkvs, newkvs );
}
45. Distributed Coding Taxonomy
l
No Distribution Coding:
l
l
l
Whole Algorithms, Whole Vector-Math!
REST + JSON: e.g. load data, GLM, get results!
Simple Data-Parallel Coding:
l
l
l
Per-Row (or neighbor row) Math!
Map/Reduce-style: e.g. Any dense linear algebra!
Complex Data-Parallel Coding
l
K/V Store, Graph Algo's, e.g. PageRank!
0xdata.c45
46. Distributed Coding Taxonomy
l
No Distribution Coding:
l
l
Whole Algorithms, Whole Vector-Math!
l
REST + JSON: e.g. load data, GLM, get results!
Simple Data-Parallel Coding:
l
Per-Row (or neighbor row) Math!
l
l
Read
the
docs!
This
talk!
Map/Reduce-style: e.g. Any dense linear algebra!
Complex Data-Parallel Coding
l
K/V Store, Graph Algo's, e.g. PageRank!
Join
our
GIT!
46
47. Distributed Data Taxonomy
Frame – a collection of Vecs
Vec – a collection of Chunks
Chunk – a collection of 1e3 to 1e6 elems
elem – a java double
Row i – i'th elements of all the Vecs in a Frame
0xdata.c47