Unstructured data is everywhere - in the form of posts, status updates, bloglets or news feeds in social media or in the form of customer interactions Call Center CRM. While many organizations study and monitor social media for tracking brand value and targeting specific customer segments, in our experience blending the unstructured data with the structured data in supplementing data science models has been far more effective than working with it independently.
In this talk we will show case an end-to-end topic and sentiment analysis pipeline we've built on the Pivotal Greenplum Database platform for Twitter feeds from GNIP, using open source tools like MADlib and PL/Python. We've used this pipeline to build regression models to predict commodity futures from tweets and in enhancing churn models for telecom through topic and sentiment analysis of call center transcripts. All of this was possible because of the flexibility and extensibility of the platform we worked with.
Leveraging Docker for Hadoop build automation and Big Data stack provisioningDataWorks Summit
Apache Bigtop as an open source Hadoop distribution, focuses on developing packaging, testing and deployment solutions that help infrastructure engineers to build up their own customized big data platform as easy as possible. However, packages deployed in production require a solid CI testing framework to ensure its quality. Numbers of Hadoop component must be ensured to work perfectly together as well. In this presentation, we'll talk about how Bigtop deliver its containerized CI framework which can be directly replicated by Bigtop users. The core revolution here are the newly developed Docker Provisioner that leveraged Docker for Hadoop deployment and Docker Sandbox for developer to quickly start a big data stack. The content of this talk includes the containerized CI framework, technical detail of Docker Provisioner and Docker Sandbox, a hierarchy of docker images we designed, and several components we developed such as Bigtop Toolchain to achieve build automation.
Apresentação de visão geral sobre a Inovação Sistemática base TRIZ (Teoria da solução inventiva de problemas ou ...de problemas inventivos). São apresentados a origem e um breve histórico, os conceitos básicos, roteiro consolidado para utilização e as ferramentas típicas, conforme o contexto da aplicação: solução de problemas complexos, desenvolvimento de novos produtos ou sistemas ou propriedade intelectual (tratamento de patentes).
Andre Carpathy, a founding member of OpenAI, explains in "State of GPT" the process of training GPT, an emerging ecosystem of large language models. It starts with pre-training with large datasets that generate the base model through tokenization and translation. Andre also explains that the power of Llama, a smaller model, is more powerful than GPT3 despite containing fewer parameters. The speaker discusses the training of Transformer models for language modeling, followed by the evolution of base models that have arisen since GPT-2. The training process consists of pre-training, supervised fine-tuning, reward modeling, and reinforcement learning. The speaker also talks about improving the performance of Transformers by prompting them, using self-consistency, and prompt engineering. Finally, the speaker addresses the limitations of LLMs, including biases and reasoning errors, and suggests using them in low-stakes applications with human oversight.
How does ChatGPT work: an Information Retrieval perspectiveSease
In this talk, we will explore the underlying mechanisms of ChatGPT, a large-scale language model developed by OpenAI, from the perspective of Information Retrieval (IR). We will delve into the process of training the model using massive amounts of data, the techniques used to optimize the model’s performance, and how the IR concepts such as tokenization, vectorization, and ranking are used in generating responses. We will also discuss how ChatGPT handles contextual understanding and how it leverages the power of transfer learning to generate high-quality and relevant responses. Software engineers will gain insights into how a modern conversational AI system like ChatGPT works, providing a better understanding of its strengths and limitations, and how to best integrate it into their software applications.
This abstract has been fully written by ChatGPT with the simple prompt in input <Write an abstract for a talk called “How does ChatGPT work? An Information Retrieval perspective”, the audience is software engineers>.
Architectures de terre crue_Sophie Bronchart_Conférence européenne Eco-Matériauxecobuild.brussels
Présentation sur la terre crue dans la construction et la rénovation durable donnée par Sophie Bronchart lors de la conférence européenne sur les éco-matériaux donnée sous l'agide du cluster greenov et les clusters Eco-Constructino et Ecobuild
Leveraging Docker for Hadoop build automation and Big Data stack provisioningDataWorks Summit
Apache Bigtop as an open source Hadoop distribution, focuses on developing packaging, testing and deployment solutions that help infrastructure engineers to build up their own customized big data platform as easy as possible. However, packages deployed in production require a solid CI testing framework to ensure its quality. Numbers of Hadoop component must be ensured to work perfectly together as well. In this presentation, we'll talk about how Bigtop deliver its containerized CI framework which can be directly replicated by Bigtop users. The core revolution here are the newly developed Docker Provisioner that leveraged Docker for Hadoop deployment and Docker Sandbox for developer to quickly start a big data stack. The content of this talk includes the containerized CI framework, technical detail of Docker Provisioner and Docker Sandbox, a hierarchy of docker images we designed, and several components we developed such as Bigtop Toolchain to achieve build automation.
Apresentação de visão geral sobre a Inovação Sistemática base TRIZ (Teoria da solução inventiva de problemas ou ...de problemas inventivos). São apresentados a origem e um breve histórico, os conceitos básicos, roteiro consolidado para utilização e as ferramentas típicas, conforme o contexto da aplicação: solução de problemas complexos, desenvolvimento de novos produtos ou sistemas ou propriedade intelectual (tratamento de patentes).
Andre Carpathy, a founding member of OpenAI, explains in "State of GPT" the process of training GPT, an emerging ecosystem of large language models. It starts with pre-training with large datasets that generate the base model through tokenization and translation. Andre also explains that the power of Llama, a smaller model, is more powerful than GPT3 despite containing fewer parameters. The speaker discusses the training of Transformer models for language modeling, followed by the evolution of base models that have arisen since GPT-2. The training process consists of pre-training, supervised fine-tuning, reward modeling, and reinforcement learning. The speaker also talks about improving the performance of Transformers by prompting them, using self-consistency, and prompt engineering. Finally, the speaker addresses the limitations of LLMs, including biases and reasoning errors, and suggests using them in low-stakes applications with human oversight.
How does ChatGPT work: an Information Retrieval perspectiveSease
In this talk, we will explore the underlying mechanisms of ChatGPT, a large-scale language model developed by OpenAI, from the perspective of Information Retrieval (IR). We will delve into the process of training the model using massive amounts of data, the techniques used to optimize the model’s performance, and how the IR concepts such as tokenization, vectorization, and ranking are used in generating responses. We will also discuss how ChatGPT handles contextual understanding and how it leverages the power of transfer learning to generate high-quality and relevant responses. Software engineers will gain insights into how a modern conversational AI system like ChatGPT works, providing a better understanding of its strengths and limitations, and how to best integrate it into their software applications.
This abstract has been fully written by ChatGPT with the simple prompt in input <Write an abstract for a talk called “How does ChatGPT work? An Information Retrieval perspective”, the audience is software engineers>.
Architectures de terre crue_Sophie Bronchart_Conférence européenne Eco-Matériauxecobuild.brussels
Présentation sur la terre crue dans la construction et la rénovation durable donnée par Sophie Bronchart lors de la conférence européenne sur les éco-matériaux donnée sous l'agide du cluster greenov et les clusters Eco-Constructino et Ecobuild
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Produced by Nathan Benaich and Air Street Capital team
Skills Ontology from It's Your Skills is a rich dynamic database of skills. The purpose is to enable capturing or mapping of skills of people and jobs across industries and functions easy, precise and holistic. Skills Ontology's key features include granularity in skills, giving contextual meaning to terms, normalization of terms and relationships between skills and skills groups.
LangChain Intro, Keymate.AI Search Plugin for ChatGPT, How to use langchain library? How to implement similar functionality in programming language of your choice? Example LangChain applications.
The presentation revolves around the concept of "langChain", This innovative framework is designed to "chain" together different components to create more advanced use cases around Large Language Models (LLMs). The idea is to leverage the power of LLMs to tackle complex problems and generate solutions that are more than the sum of their parts.
One of the key features of the presentation is the application of the "Keymate.AI Search" plugin in conjunction with the Reasoning and Acting Chain of Thought (ReAct) framework. The presenter encourages the audience to utilize these tools to generate reasoning traces and actions. The ReAct framework, learned from an initial search, is then applied to these traces and actions, demonstrating the potential of LLMs to learn and apply complex frameworks.
The presentation also delves into the impact of climate change on biodiversity. The presenter prompts the audience to look up the latest research on this topic and summarize the key findings. This exercise not only highlights the importance of climate change but also demonstrates the capabilities of LLMs in researching and summarizing complex topics.
The presentation concludes with several key takeaways. The presenter emphasizes that specialized custom solutions work best and suggests a bottom-up approach to expert systems. However, they caution that over-abstraction can lead to leakages, causing time and money limits to hit early and tasks to fail or require many iterations. The presenter also notes that while prompt engineering is important, it's not necessary to over-optimize if the LLM is clever. The presentation ends on a hopeful note, expressing a need for more clever LLMs and acknowledging that good applications are rare but achievable.
Overall, the presentation provides a comprehensive overview of the LanGCHAIN framework, its applications, and the potential of LLMs in solving complex problems. It serves as a call to action for the audience to explore these tools and frameworks.
ChatGPT is a chatbot developed by OpenAI and launched in November 2022.
Useful to all the school and college going
Kindly use ChatGPT to enhance your knowledge
ChatGPT and Brad AI are AI language models. ChatGPT, based on GPT-3.5, provides engaging conversation and responses. It's trained on vast internet text data. Brad AI, however, is unspecified and lacks specific information for comparison. Both aim to facilitate conversation and deliver meaningful interactions, but their capabilities depend on their respective architectures and training data.
OpenAI GPT in Depth - Questions and MisconceptionsIvo Andreev
OpenAI GPT in depth – misconceptions and questions you would like answered
Have you ever wondered why GPT models work? Do you ask questions like:
How does GPT work? Why does the same problem receive different answers for different users? Is there a way to improve explainability? Can GPT model provide its sources? Why does Bing chat work differently? What are my ways to have better performance and improve completions? How can I work with data in my enterprise? What practical business cases could a generative AI model fit solving?
If you are tired of sessions just scratching the surface of OpenAI GPT, this one will go deeper and answer questions like why, why not and how.
My presentation entitled 'AI, Creativity and Generative Art', presented at the annual symposium for AI students (CKI) at Utrecht University, Fri. June 16th, 2017
HDFS has several strengths: horizontally scale its IO bandwidth and scale its storage to petabytes of storage. Further, it provides very low latency metadata operations and scales to over 60K concurrent clients. Hadoop 3.0 recently added Erasure Coding. One of HDFS’s limitations is scaling a number of files and blocks in the system. We describe a radical change to Hadoop’s storage infrastructure with the upcoming Ozone technology. It allows Hadoop to scale to tens of billions of files and blocks and, in the future, to every larger number of smaller objects. Ozone fundamentally separates the namespace layer and the block layer allowing new namespace layers to be added in the future. Further, the use of RAFT protocol has allowed the storage layer to be self-consistent. We show how this technology helps a Hadoop user and also what it means for evolving HDFS in the future. We will also cover the technical details of Ozone.
Speaker: Sanjay Radia, Chief Architect, Founder, Hortonworks
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Produced by Nathan Benaich and Air Street Capital team
Skills Ontology from It's Your Skills is a rich dynamic database of skills. The purpose is to enable capturing or mapping of skills of people and jobs across industries and functions easy, precise and holistic. Skills Ontology's key features include granularity in skills, giving contextual meaning to terms, normalization of terms and relationships between skills and skills groups.
LangChain Intro, Keymate.AI Search Plugin for ChatGPT, How to use langchain library? How to implement similar functionality in programming language of your choice? Example LangChain applications.
The presentation revolves around the concept of "langChain", This innovative framework is designed to "chain" together different components to create more advanced use cases around Large Language Models (LLMs). The idea is to leverage the power of LLMs to tackle complex problems and generate solutions that are more than the sum of their parts.
One of the key features of the presentation is the application of the "Keymate.AI Search" plugin in conjunction with the Reasoning and Acting Chain of Thought (ReAct) framework. The presenter encourages the audience to utilize these tools to generate reasoning traces and actions. The ReAct framework, learned from an initial search, is then applied to these traces and actions, demonstrating the potential of LLMs to learn and apply complex frameworks.
The presentation also delves into the impact of climate change on biodiversity. The presenter prompts the audience to look up the latest research on this topic and summarize the key findings. This exercise not only highlights the importance of climate change but also demonstrates the capabilities of LLMs in researching and summarizing complex topics.
The presentation concludes with several key takeaways. The presenter emphasizes that specialized custom solutions work best and suggests a bottom-up approach to expert systems. However, they caution that over-abstraction can lead to leakages, causing time and money limits to hit early and tasks to fail or require many iterations. The presenter also notes that while prompt engineering is important, it's not necessary to over-optimize if the LLM is clever. The presentation ends on a hopeful note, expressing a need for more clever LLMs and acknowledging that good applications are rare but achievable.
Overall, the presentation provides a comprehensive overview of the LanGCHAIN framework, its applications, and the potential of LLMs in solving complex problems. It serves as a call to action for the audience to explore these tools and frameworks.
ChatGPT is a chatbot developed by OpenAI and launched in November 2022.
Useful to all the school and college going
Kindly use ChatGPT to enhance your knowledge
ChatGPT and Brad AI are AI language models. ChatGPT, based on GPT-3.5, provides engaging conversation and responses. It's trained on vast internet text data. Brad AI, however, is unspecified and lacks specific information for comparison. Both aim to facilitate conversation and deliver meaningful interactions, but their capabilities depend on their respective architectures and training data.
OpenAI GPT in Depth - Questions and MisconceptionsIvo Andreev
OpenAI GPT in depth – misconceptions and questions you would like answered
Have you ever wondered why GPT models work? Do you ask questions like:
How does GPT work? Why does the same problem receive different answers for different users? Is there a way to improve explainability? Can GPT model provide its sources? Why does Bing chat work differently? What are my ways to have better performance and improve completions? How can I work with data in my enterprise? What practical business cases could a generative AI model fit solving?
If you are tired of sessions just scratching the surface of OpenAI GPT, this one will go deeper and answer questions like why, why not and how.
My presentation entitled 'AI, Creativity and Generative Art', presented at the annual symposium for AI students (CKI) at Utrecht University, Fri. June 16th, 2017
HDFS has several strengths: horizontally scale its IO bandwidth and scale its storage to petabytes of storage. Further, it provides very low latency metadata operations and scales to over 60K concurrent clients. Hadoop 3.0 recently added Erasure Coding. One of HDFS’s limitations is scaling a number of files and blocks in the system. We describe a radical change to Hadoop’s storage infrastructure with the upcoming Ozone technology. It allows Hadoop to scale to tens of billions of files and blocks and, in the future, to every larger number of smaller objects. Ozone fundamentally separates the namespace layer and the block layer allowing new namespace layers to be added in the future. Further, the use of RAFT protocol has allowed the storage layer to be self-consistent. We show how this technology helps a Hadoop user and also what it means for evolving HDFS in the future. We will also cover the technical details of Ozone.
Speaker: Sanjay Radia, Chief Architect, Founder, Hortonworks
SentiTweet is a sentiment analysis tool for identifying the sentiment of the tweets as positive, negative and neutral.SentiTweet comes to rescue to find the sentiment of a single tweet or a set of tweets. Not only that it also enables you to find out the sentiment of the entire tweet or specific phrases of the tweet.
Make a query regarding a topic of interest and come to know the sentiment for the day in pie-chart or for the week in form of line-chart for the tweets gathered from twitter.com
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Sarah Aerni
Slides from the Pivotal Open Source Hub Meetup
"Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Science!"
As the need for data science as a key differentiator grows in all industries, from large corporations to startups, the need to get to results quickly is enabled by sharing ideas and methods in the community. The data science team at Pivotal leverages and contributes to this community of publicly available and open source technologies as part of their practice. We will share the resources we use by highlighting specific toolkits for building models (e.g. MADlib, R) and visualization (e.g. Gephi and Circos) along with their benefits and limitations by sharing examples from Pivotal's data science engagements. At the end of this session we hope to have answered the questions: Where can I get started with Data Science? Which toolkit is most appropriate for building a model with my dataset? How can I visualize my results to have the greatest impact?
Bio: Sarah Aerni is a member of the Pivotal Data Science team with a focus on healthcare and life science. She has a background in the field of Bioinformatics, developing tools to help biomedical researchers understand their data. She holds a B.S. In Biology with a specialization in Bioinformatics and minor in French Literature from UCSD, and an M.S. and Ph.D in Biomedical Informatics from Stanford University. During her time as a researcher she focused on the interface between machine learning and biology, building computational models enabling research for a broad range of fields in biomedicine. She also co-founded a start-up providing informatics services to researchers and small companies. At Pivotal she works with customers in life science and healthcare building models to derive insight and business value from their data.
Given the buzz and data, social media has generated over recent times, we feel that Germin8 Social Listening will be able to add a great deal of value to brands.
These could be in the areas of:
Reputation Monitoring/ Management
Campaign Analysis- pre and post sentiments analysis
Industry or category Research & Competition Tracking
Lead Generation
Brand & Product Insights/Image
Consumer Attitude & Behavior
Adverse Event Monitoring & Crisis Management
Online Reputation Management
Twitter has brought much attention recently as a hot research topic in the domain of sentiment analysis. Training sentiment classifiers from tweets data often faces the data sparsity problem partly due to the large variety of short and irregular forms introduced to tweets because of the 140-character limit. In this work we propose using two different sets of features to alleviate the data sparseness problem. One is the semantic feature set where we extract semantically hidden concepts from tweets and then incorporate them into classifier training through interpolation. Another is the sentiment-topic feature set where we extract latent topics and the associated topic sentiment from tweets, then augment the original feature space with these sentiment-topics. Experimental results on the Stanford Twitter Sentiment Dataset show that both feature sets outperform the baseline model using unigrams only. Moreover, using semantic features rivals the previously reported best result. Using sentiment-topic features achieves 86.3% sentiment classification accuracy, which outperforms existing approaches.
Social media & sentiment analysis splunk conf2012Michael Wilde
This presentation was delivered at Splunk's User Conference (conf2012). It covers info about social media data, how to index / use it with Splunk and a lot of content around Sentiment Analysis.
Sentiments Analysis using Python and nltk Ashwin Perti
The presentation contains about how to classify the sentiments or sentiment analysis. Especially there are positive or negative emotions. So to classify them we have used python language by taking the help of nltk package.
Sentiment analysis using naive bayes classifier Dev Sahu
This ppt contains a small description of naive bayes classifier algorithm. It is a machine learning approach for detection of sentiment and text classification.
Big amount of information is available in textual form in databases or online sources, and for many enterprise functions (marketing, maintenance, finance, etc.) represents a huge opportunity to improve their business knowledge.
Sentiment analysis or opinion mining refers to the application of language processing to identify and extract subjective information in source materials. Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document.
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...PyData
The Python data ecosystem has grown beyond the confines of single machines to embrace scalability. Here we describe one of our approaches to scaling, which is already being used in production systems. The goal of in-database analytics is to bring the calculations to the data, reducing transport costs and I/O bottlenecks. Using PL/Python we can run parallel queries across terabytes of data using not only pure SQL but also familiar PyData packages such as scikit-learn and nltk. This approach can also be used with PL/R to make use of a wide variety of R packages. We look at examples on Postgres compatible systems such as the Greenplum Database and on Hadoop through Pivotal HAWQ. We will also introduce MADlib, Pivotal’s open source library for scalable in-database machine learning, which uses Python to glue SQL queries to low level C++ functions and is also usable through the PyMADlib package.
Massively Parallel Processing with Procedural Python (PyData London 2014)Ian Huston
The Python data ecosystem has grown beyond the confines of single machines to embrace scalability. Here we describe one of our approaches to scaling, which is already being used in production systems. The goal of in-database analytics is to bring the calculations to the data, reducing transport costs and I/O bottlenecks. Using PL/Python we can run parallel queries across terabytes of data using not only pure SQL but also familiar PyData packages such as scikit-learn and nltk. This approach can also be used with PL/R to make use of a wide variety of R packages. We look at examples on Postgres compatible systems such as the Greenplum Database and on Hadoop through Pivotal HAWQ. We will also introduce MADlib, Pivotal’s open source library for scalable in-database machine learning, which uses Python to glue SQL queries to low level C++ functions and is also usable through the PyMADlib package.
Pivotal OSS meetup - MADlib and PivotalRgo-pivotal
With the explosion of big data, the need for fast and inexpensive analytics solutions has become a key basis of competition in many industries. Extracting the value of big data with analytics can be complex, and requires advanced skills.
At Pivotal, we are building open-source solutions (MADlib, PivotalR, PyMadlib) to simplify this process for the user, while maintaining the efficiency necessary for big data analysis.
This talk will provide information about MADlib, an open source library of SQL-based algorithms for machine learning, data mining and statistics that run at large scale within a database engine, with no need for data import/export to other tools.
It provides an overview of the library’s architecture and compares various statistical methods with those available in Apache Mahout.
We also introduce, PivotalR, a R-based wrapper for MADlib that allows data scientists and programmers to access power of MADlib along with the ease of use of R.
Lecture 3: Structuring Unstructured Texts Through Sentiment AnalysisMarina Santini
Objective of sentiment analysis: Given an opinion document d, discover all opinion quintuples (ei, aij, sijkl, hk, tl) in d. With these quintuples, unstructured data --> structured data (Bing Liu, Sentiment Analysis and Opinion Mining. 2012)
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal Srivatsan Ramanujam
These slides give an overview of the technology and the tools used by Data Scientists at Pivotal Data Labs. This includes Procedural Languages like PL/Python, PL/R, PL/Java, PL/Perl and the parallel, in-database machine learning library MADlib. The slides also highlight the power and flexibility of the Pivotal platform from embracing open source libraries in Python, R or Java to using new computing paradigms such as Spark on Pivotal HD.
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsEsther Vasiete
Pivotal workshop slide deck for Structure Data 2016 held in San Francisco.
Abstract:
Learn how data scientists at Pivotal build machine learning models at massive scale on open source MPP databases like Greenplum and HAWQ (under Apache incubation) using in-database machine learning libraries like MADlib (under Apache incubation) and procedural languages like PL/Python and PL/R to take full advantage of the rich set of libraries in the open source community. This workshop will walk you through use cases in text analytics and image processing on MPP.
This is the presentation I delivered on Hadoop User Group Ireland meetup in Dublin on Nov 28 2015. It covers at glance the architecture of GPDB and most important its features. Sorry for the colors - Slideshare is crappy with PDFs
Solution Use Case Demo: The Power of Relationships in Your Big DataInfiniteGraph
In this security solution demo, we have integrated Oracle NoSQL DB with InfiniteGraph to demonstrate the power of using the right tools for the solution. By integrating the key value technology of Oracle with the InfiniteGraph distributed graph database, we are able to create new views of existing Call Detail Record (CDR) details to enable discovery of connections, paths and behaviors that may otherwise be missed.
Discover how to add value to your existing Big Data to increase revenues and performance!
Data Science Amsterdam - Massively Parallel Processing with Procedural LanguagesIan Huston
The goal of in-database analytics is to bring the calculations to the data, reducing transport costs and I/O bottlenecks. With Procedural Languages such as PL/Python and PL/R data parallel queries can be run across terabytes of data using not only pure SQL but also familiar Python and R packages. The Pivotal Data Science team have used this technique to create fraud behaviour models for each individual user in a large corporate network, to understand interception rates at customs checkpoints by accelerating natural language processing of package descriptions and to reduce customer churn by building a sentiment model using customer call centre records.
http://www.meetup.com/Data-Science-Amsterdam/events/178974942/
Spark For Plain Old Java Geeks (June2014 Meetup)sdeeg
An overview of the Apache Spark project from the perspective of a Java programmer. Topics: What is Spark, Spark Programming Model, Spark eco-system, 1.0 release and why it's a huge milestone.
HCL Z Data Tools (ZDT) is a set of tools that help you to manipulate data stored on z/OS systems interactively and in batch processing. It is designed to deal with extensive production data efficiently while protecting integrity and privacy. ZDT can provide a generic data access solution without programming. Its continued enhancement keeps it ready to meet today's and future mainframe data manipulation requirements.
Read to know more: https://www.hcltechsw.com/zdt
"We can all agree that streaming is super cool. And for a while now, the adoption conversation has been largely led with an all-in mentality. But that’s silly. The only concerns end users have are:
-The freshness of their data
-Latency they require to meet their SLAs from source to consumption
-All while maintaining data quality and governance.
Luckily, the industry has realized this and we have seen a shift of streaming capabilities surfacing as an in-database technology, via objects as trivial to analytics engineers as views - materialized that is. With this convergence of streaming capabilities and batch level accessibility, this is when ELT tools like dbt can join in and expand out the adoption story.
dbt is the T in ELT, Extract Load and Transform. In dbt, analytics engineers design models - SQL (and occasional python) statements that encapsulate business logic. At runtime, dbt will wrap that logic in a DDL statement and send it over to the data platform to execute.
In this session, we’ll discuss how we see streaming at dbt Labs. We will dive into how we are extending dbt to support low-latency scenarios and the recent additions we have made to make batch and streaming allies in a DAG rather than archenemies."
Analyzing Big data in R and Scala using Apache Spark 17-7-19Ahmed Elsayed
We can make a data mining to get the prediction about the future data, which is mined from an old data especially Big data using a machine learning algorithms based on two clusters. One is the intrinsic for managing the file system of Big data, which is called Hadoop. The other is essentially to make fast analysis of Big data which is called Apache Spark. In order to achieve this purpose we will use R based on Rstudio or Scala based on Zeppelin.
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsGreg Makowski
This is a first presentation of Kamanja, a new open-source real-time software product, which integrates with other big-data systems. See also links: http://www.meetup.com/SF-Bay-ACM/events/223615901/ and http://Kamanja.org to download, for docs or community support. For the YouTube video, see https://www.youtube.com/watch?v=g9d87rvcSNk (you may want to start at minute 33).
Mankind has stored more than 295 billion gigabytes (or 295 Exabyte) of data since 1986, as per a report by the University of Southern California. Storing and monitoring this data in widely distributed environments for 24/7 is a huge task for global service organizations. These datasets require high processing power which can’t be offered by traditional databases as they are stored in an unstructured format. Although one can use Map Reduce paradigm to solve this problem using java based Hadoop, it cannot provide us with maximum functionality. Drawbacks can be overcome using Hadoop-streaming techniques that allow users to define non-java executable for processing this datasets. This paper proposes a THESAURUS model which allows a faster and easier version of business analysis.
Java in the database–is it really useful? Solving impossible Big Data challengesRogue Wave Software
Since 1999, Oracle has included a Java Virtual Machine (JVM) within the database. That makes it old enough to drive and well past time to get a real job. In today’s data-obsessed world, that job is fortifying Oracle’s database with a healthy dose of analytics to give your database the power to handle the data challenges of the 21st century.
There are numerous advantages to adopting a 100% Java code base for in-database analytics. Security is doubly enhanced by performing all analytics in the database. The code is highly portable as the identical java classes that run in the database will run on any client with any operating system. Plus the modern paradigm of taking the algorithms to the data is elegantly achieved with minimal effort.
Until now, a single Java solution with all these qualities wasn’t available. By using JMSL Numerical Libraries, you get a suite of algorithms with routines for predictive analytics, data mining, regression, forecasting, and data cleaning. JMSL is scalable and can be used in Hadoop MapReduce applications. Now, JMSL Numerical Libraries makes Java in the database more than useful -- it makes it unbeatable.
This webinar walks through the argument of why embedded analytics is better and provides examples using an Oracle database and JMSL.
Webinar recording: https://www.brighttalk.com/webcast/12285/164525
Integration Patterns for Big Data ApplicationsMichael Häusler
Big Data technologies like distributed databases, queues, batch processors, and stream processors are fun and exciting to play with. Making them play nicely together can be challenging. Keeping it fun for engineers to continuously improve and operate them is hard. At ResearchGate, we run thousands of YARN applications every day to gain insights and to power user facing features. Of course, there are numerous integration challenges on the way:
* integrating batch and stream processors with operational systems
* ingesting data and playing back results while controlling performance crosstalk
* rolling out new versions of synchronous, stream, and batch applications and their respective data schemas
* controlling the amount of glue and adapter code between different technologies
* modeling cross-flow dependencies while handling failures gracefully and limiting their repercussions
We describe our ongoing journey in identifying patterns and principles to make our big data stack integrate well. Technologies to be covered will include MongoDB, Kafka, Hadoop (YARN), Hive (TEZ), Flink Batch, and Flink Streaming.
Similar to A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal Greenplum Database (20)
Sales and marketing teams in enterprises have too many leads to pursue but have limited time and budget at their disposal. To build a strong sales pipeline, marketers should target their prospects with the right content to engage their interests and nurture them before handing them off to their sales teams. Prioritizing the right deals for sales team requires effective strategies for scoring leads and accurately forecasting opportunities help them identify issues early to meet their targets. In this talk, we will look under the hood of the machine learning pipelines in Salesforce Einstein that help sales and marketing teams win more deals. Specifically, we'll look at the problem of scoring prospects based on their engagement so that marketers know when they are ready to buy. Next, we will share our journey on model interpretability in providing actionable insights with our predictions. Finally, we will describe how we generate scores and insights for all customers through a model tournament, so that enterprises and small businesses alike can reap the benefits of machine learning.
Approaches and Open Source Tools for Wrangling and Modeling Massive Datasets (Sarah Aerni)
Text Analytics at Scale on MPP (Srivatsan Ramanujam)
A Scalable Framework For Real Time Monitoring & Prediction Of Sensor Data (Jarrod Vawdrey)
Climate Data Lake: Empowering Citizen Scientists in Acadia National ParkSrivatsan Ramanujam
Learn how EMC and Pivotal are teaming up to empower citizen scientists @ Acadia National Park to study climate change and its influence on phenology in the park, by building a Climate Data Lake.
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...Srivatsan Ramanujam
These are slides from my talk @ DataDay Texas, in Austin on 30 Mar 2013
(http://2013.datadaytexas.com/schedule)
Favorite and Fork PyMADlib on GitHub: https://github.com/gopivotal/pymadlib
MADlib: http://madlib.net
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Why do we care about Apps as well as Data? By ‘apps’ we mean “enterprise and cloud applications” and how they are built. Pivotal has said a lot about data in public but we care about apps just as much. Leveraging the strengths of vFabric and Spring, Pivotal will continue to enable customers to build the applications they need. Applications are how our customers offer many new products and services today. Apps can accelerate customer interactions and realize value from data by presenting it to users in a meaningful way. With tools like Spring and Cloud Foundry, we can make ‘big data’ comprehensible and ‘easy’ to developers and hence to enterprises. And of course: users generate data, sensors generate data, phones generate data.. but much of this data comes from some sort of application.