Trend Micro collects lots of threat knowledge data for clients containing many different threat (web) entities. Most threat entities will be observed along with relations, such as malicious behaviors or interaction chains among them. So, we built a graph model on HBase to store all the known threat entities and their relationships, allowing clients to query threat relationships via any given threat entity. This presentation covers what problems we try to solve, what and how the design decisions we made, how we design such a graph model, and the graph computation tasks involved.
Presentation given for the SQLPass community at SQLBits XIV in Londen. The presentation is an overview about the performance improvements provided to Hive with the Stinger initiative.
From the Hadoop Summit 2015 Session with Tomer Shiran.
To deliver real-time impact from big data, organizations must evolve beyond traditional analytic approaches to support a new class of agile, distributed applications. Real-time Hadoop overcomes batch programs reliant on data transformations and schema management. This session highlights how leading organizations are leveraging Hadoop and NoSQL to merge analytics and production data to make adjustments while business is happening to optimize revenue, mitigate risk and reduce operational costs. Details include how companies have achieved real-time impact on their business, collapsed data silos, and automated in-line analytics with operational data for immediate impact.
Presentation given for the SQLPass community at SQLBits XIV in Londen. The presentation is an overview about the performance improvements provided to Hive with the Stinger initiative.
From the Hadoop Summit 2015 Session with Tomer Shiran.
To deliver real-time impact from big data, organizations must evolve beyond traditional analytic approaches to support a new class of agile, distributed applications. Real-time Hadoop overcomes batch programs reliant on data transformations and schema management. This session highlights how leading organizations are leveraging Hadoop and NoSQL to merge analytics and production data to make adjustments while business is happening to optimize revenue, mitigate risk and reduce operational costs. Details include how companies have achieved real-time impact on their business, collapsed data silos, and automated in-line analytics with operational data for immediate impact.
Hadoop Infrastructure @Uber Past, Present and FutureDataWorks Summit
Uber’s mission is to provide transportation as reliable as running water and for fulfilling that mission data plays a critical role. In Uber, Hadoop plays a critical role in Data Infrastructure. We want to talk about the journey of Hadoop @Uber and our future plans in terms of scaling for billions of trips. We will talk about most unique use case Uber have and how Hadoop and eco system which we built, helped us in this journey. We want to talk about how we scaled from 10 -> 2000 and In future to scale up to 10’s X1000 of Nodes. We will talk about our mistakes, learning and wins and how we process billions of events per day. We will talk about the unique challenges and real world use-cases and how we will co-locate the Uber’s service architecture with batch (e.g data pipelines, machine learning and analytical workloads). Uber have done lot of improvements to current Hadoop eco system and uniquely solved some of the problems in a way which is never been solved in the past. This presentation will help audience to use this as an example and even encourage them to enhance the eco system. This will help to increase the community of these project and overall help the whole big data space. Audience is anybody who is working on Big Data and want to understand how to scale Hadoop and eco system for 10s of thousands of node. This talk will help them understand the Hadoop ecosystem and how to efficiently use that. It will also introduce them to some of the awesome technologies which Uber team is building in big data space.
Practical Machine Learning: Innovations in Recommendation WorkshopMapR Technologies
Ted Dunning, Committer for Apache Mahout, Drill & Zookeeper presents on:
1. How to build a production quality recommendation engine using Mahout and Solr or Elasticsearch
2. How to build a multi-modal recommendation from multiple behavioral inputs
3. How search engines can be used for more than just text
This talk will present a detailed tear-down and walk-through of a working soup-to-nuts recommendation engine that uses observations of multiple kinds of behavior to do combined recommendation and cross recommendation. The system uses Mahout to do off-line analysis and can use Solr or Elasticsearch to provide real-time recommendations. The talk will also include enough theory to provide useful working intuitions for those desiring to adapt this design.
The entire system including a data generator, off-line analysis scripts, Solr and Elasticsearch configurations and sample web pages will be made available on github for attendees to modify as they like.
Building recommendation engines by abusing a search engine has been well-known for some time to a small sub-culture in the recommendation community, but techniques for building multi-model recommendation engines are not at all well known.
How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald
There is no better example of the important role that data plays in our lives than in matters of our health and our healthcare. There’s a growing wealth of health-related data out there, and it’s playing an increasing role in improving patient care, population health, and healthcare economics.
Join this talk to hear how MapR customers are using big data and advanced analytics to address a myriad of healthcare challenges—from patient to payer.
We will cover big data healthcare trends and production use cases that demonstrate how to deliver data-driven healthcare applications
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
This discusses the architecture of an end-to-end application that combines streaming data with machine learning to do real-time analysis and visualization of where and when Uber cars are clustered, so as to analyze and visualize the most popular Uber locations.
Hadoop clusters can store nearly everything in a cheap and blazingly fast way to your data lake. Answering questions and gaining insights out of this ever growing stream becomes the decisive part for many businesses. Increasingly data has a natural structure as a graph, with vertices linked by edges, and many questions arising about the data involve graph traversals or other complex queries, for which one does not have an a priori given bound on the length of paths.
Spark with GraphX is great for answering relatively simple graph questions which are worth starting a Spark job for, because they essentially involve the whole graph. But does it make sense to start one for every ad-hoc query or is it suitable for complex real-time queries?
In this talk I will introduce an alternative solution that adds those features to an existing Hadoop/Spark setup and enables real-time insights. I will address the following topics:
* Challenges in gaining deeper insights from large amounts of graph data
* Benefits and limitations of graph analysis with Spark
* Introduction to ArangoDB SmartGraphs
* Deployment of Hadoop, Spark and ArangoDB using DC/OS
* Performing complex queries on billions of nodes and vertices leveraging ArangoDB SmartGraphs (Live Demo)
Sherlock: an anomaly detection service on top of Druid DataWorks Summit
Sherlock is an anomaly detection service built on top of Druid. It leverages EGADS (Extensible Generic Anomaly Detection System; github.com/yahoo/egads) to detect anomalies in time-series data. Users can schedule jobs on an hourly, daily, weekly, or monthly basis, view anomaly reports from Sherlock's interface, or receive them via email.
Sherlock has four major components: timeseries generation, EGADS anomaly detection, Redis backend and Spark Java UI. Timeseries generation involves building, validating, querying, parsing the Druid query. Parsed Druid response is then fed to EGADS anomaly detection component which detects and generates the anomaly reports for each input time-series data. Sherlock uses Redis backend to store jobs metadata, generated anomaly reports and persistent job queue for scheduling jobs, etc. Users can choose to have a clustered Redis or standalone Redis. Sherlock provides user interface built with Spark Java. The UI enables users to submit instant anomaly analysis, create, and launch detection jobs, view anomalies on a heatmap and on a graph. Jigarkumar Patel, Software Development Engineer I, Oath Inc. and, David Servose, Software Systems Engineer, Oath
AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...Amazon Web Services
We’ve entered a new connectivity oriented world where we can access information any time, any place, on any device, 24 hours a day, and cloud computing is a major enabler of this flexibility. Like you, more and more businesses are looking to the cloud for better, faster, more powerful and affordable communications and while many would think that security in the cloud is much different, the reality is less dramatic. Moving to the cloud still requires using proven security techniques, but sometimes in new and dynamic ways that adapt to the elastic nature of cloud architecture. Join us as we discuss the latest cloud security solutions, including real world examples of how organizations like yours are succeeding against new and evolving threats. We will examine security considerations beyond what is provided by security-conscious cloud providers like Amazon Web Services and what additional factors you might want to think about when deploying to the cloud.
Hadoop Infrastructure @Uber Past, Present and FutureDataWorks Summit
Uber’s mission is to provide transportation as reliable as running water and for fulfilling that mission data plays a critical role. In Uber, Hadoop plays a critical role in Data Infrastructure. We want to talk about the journey of Hadoop @Uber and our future plans in terms of scaling for billions of trips. We will talk about most unique use case Uber have and how Hadoop and eco system which we built, helped us in this journey. We want to talk about how we scaled from 10 -> 2000 and In future to scale up to 10’s X1000 of Nodes. We will talk about our mistakes, learning and wins and how we process billions of events per day. We will talk about the unique challenges and real world use-cases and how we will co-locate the Uber’s service architecture with batch (e.g data pipelines, machine learning and analytical workloads). Uber have done lot of improvements to current Hadoop eco system and uniquely solved some of the problems in a way which is never been solved in the past. This presentation will help audience to use this as an example and even encourage them to enhance the eco system. This will help to increase the community of these project and overall help the whole big data space. Audience is anybody who is working on Big Data and want to understand how to scale Hadoop and eco system for 10s of thousands of node. This talk will help them understand the Hadoop ecosystem and how to efficiently use that. It will also introduce them to some of the awesome technologies which Uber team is building in big data space.
Practical Machine Learning: Innovations in Recommendation WorkshopMapR Technologies
Ted Dunning, Committer for Apache Mahout, Drill & Zookeeper presents on:
1. How to build a production quality recommendation engine using Mahout and Solr or Elasticsearch
2. How to build a multi-modal recommendation from multiple behavioral inputs
3. How search engines can be used for more than just text
This talk will present a detailed tear-down and walk-through of a working soup-to-nuts recommendation engine that uses observations of multiple kinds of behavior to do combined recommendation and cross recommendation. The system uses Mahout to do off-line analysis and can use Solr or Elasticsearch to provide real-time recommendations. The talk will also include enough theory to provide useful working intuitions for those desiring to adapt this design.
The entire system including a data generator, off-line analysis scripts, Solr and Elasticsearch configurations and sample web pages will be made available on github for attendees to modify as they like.
Building recommendation engines by abusing a search engine has been well-known for some time to a small sub-culture in the recommendation community, but techniques for building multi-model recommendation engines are not at all well known.
How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald
There is no better example of the important role that data plays in our lives than in matters of our health and our healthcare. There’s a growing wealth of health-related data out there, and it’s playing an increasing role in improving patient care, population health, and healthcare economics.
Join this talk to hear how MapR customers are using big data and advanced analytics to address a myriad of healthcare challenges—from patient to payer.
We will cover big data healthcare trends and production use cases that demonstrate how to deliver data-driven healthcare applications
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
This discusses the architecture of an end-to-end application that combines streaming data with machine learning to do real-time analysis and visualization of where and when Uber cars are clustered, so as to analyze and visualize the most popular Uber locations.
Hadoop clusters can store nearly everything in a cheap and blazingly fast way to your data lake. Answering questions and gaining insights out of this ever growing stream becomes the decisive part for many businesses. Increasingly data has a natural structure as a graph, with vertices linked by edges, and many questions arising about the data involve graph traversals or other complex queries, for which one does not have an a priori given bound on the length of paths.
Spark with GraphX is great for answering relatively simple graph questions which are worth starting a Spark job for, because they essentially involve the whole graph. But does it make sense to start one for every ad-hoc query or is it suitable for complex real-time queries?
In this talk I will introduce an alternative solution that adds those features to an existing Hadoop/Spark setup and enables real-time insights. I will address the following topics:
* Challenges in gaining deeper insights from large amounts of graph data
* Benefits and limitations of graph analysis with Spark
* Introduction to ArangoDB SmartGraphs
* Deployment of Hadoop, Spark and ArangoDB using DC/OS
* Performing complex queries on billions of nodes and vertices leveraging ArangoDB SmartGraphs (Live Demo)
Sherlock: an anomaly detection service on top of Druid DataWorks Summit
Sherlock is an anomaly detection service built on top of Druid. It leverages EGADS (Extensible Generic Anomaly Detection System; github.com/yahoo/egads) to detect anomalies in time-series data. Users can schedule jobs on an hourly, daily, weekly, or monthly basis, view anomaly reports from Sherlock's interface, or receive them via email.
Sherlock has four major components: timeseries generation, EGADS anomaly detection, Redis backend and Spark Java UI. Timeseries generation involves building, validating, querying, parsing the Druid query. Parsed Druid response is then fed to EGADS anomaly detection component which detects and generates the anomaly reports for each input time-series data. Sherlock uses Redis backend to store jobs metadata, generated anomaly reports and persistent job queue for scheduling jobs, etc. Users can choose to have a clustered Redis or standalone Redis. Sherlock provides user interface built with Spark Java. The UI enables users to submit instant anomaly analysis, create, and launch detection jobs, view anomalies on a heatmap and on a graph. Jigarkumar Patel, Software Development Engineer I, Oath Inc. and, David Servose, Software Systems Engineer, Oath
AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...Amazon Web Services
We’ve entered a new connectivity oriented world where we can access information any time, any place, on any device, 24 hours a day, and cloud computing is a major enabler of this flexibility. Like you, more and more businesses are looking to the cloud for better, faster, more powerful and affordable communications and while many would think that security in the cloud is much different, the reality is less dramatic. Moving to the cloud still requires using proven security techniques, but sometimes in new and dynamic ways that adapt to the elastic nature of cloud architecture. Join us as we discuss the latest cloud security solutions, including real world examples of how organizations like yours are succeeding against new and evolving threats. We will examine security considerations beyond what is provided by security-conscious cloud providers like Amazon Web Services and what additional factors you might want to think about when deploying to the cloud.
This talk is about how to secure your front-end + backend applications using a RESTful approach. As opposed to traditional and monolithic server-side applications (where the HTTP session is used), when your front-end application is running on a browser and not securely from the server, there are few things you need to consider. In this session Alvaro will explore standards like OAuth and JWT to achieve a stateless, token-based authentication and authorisation. He will explore the existing impl More specifically, the demonstration will be made using Spring Security REST, a popular Grails plugin written by Álvaro.
Authentication is normally a stateful service. Most of the implementations rely on the HTTP session, thus introducing state as the session is an in-memory data structure in the application server.
In the microservices era, most of the companies are developing such called RESTful services, where one of the principles is to create stateless systems. In such scenario, authentication should be stateless too.
There is a standard specification to secure web application and API's, that is being adopted massively by the industry: OAuth 2. The specification doesn't explicitly cover how to make a stateless implementation. And most of the existing ones depend on some sort of external storage (such as a DB) to store the tokens generated for a later validation.
Fortunately, there is another specification by the IETF called JSON Web Token, that can be combined with OAuth 2 to achieve a stateless authentication system.
In the session, Alvaro will explain the core concepts of OAuth 2, as well as JWT and how can them be used together to achieve the last 2 letters of REST: State Transfer.
Chat bot making process using Python 3 & TensorFlowJeongkyu Shin
Recently, chat bot has become the center of public attention as a new mobile user interface since 2015. Chat bots are widely used to reduce human-to-human interaction, from consultation to online shopping and negotiation, and still expanding the application coverage. Also, chat bot is the basic of conversational interface and non-physical input interface with combination of voice recognition.
Traditional chat bots were developed based on the natural language processing (NLP) and bayesian statistics for user intention recognition and template-based response. However, since 2012, accelerated advance in deep-learning technology and NLPs using deep-learning opened the possibilities to create chat bots with machine learning. Machine learning (ML)-based chat bot development has advantages, for instance, ML-based bots can generate (somewhat non-sense but acceptable) responses to random asks that has no connection with the context once the model is constructed with appropriate learning level.
In this talk, I will introduce the garage chat bot creation process step-by-step. I share the idea and implementations of multi-modal machine learning model with context engine and conversion engine. Also, how to implement Korean natural language processing, continuous conversion and tone manipulation is also discussed.
Chat bot (챗 봇)은 2015년부터 모바일을 중심으로 새로운 사용자 UI로 주목받고 있다. 챗 봇은 상담시 인간-인간 인터랙션을 줄이는 용도부터 온라인 쇼핑 구매에 이르기까지 다양한 분야에 활용되고 있으며 그 범위를 넓혀 나가고 있다. 챗 봇은 대화형 인터페이스의 기초이면서 동시에 (음성 인식과 결합을 통한) 무입력 방식 인터페이스의 기반 기술이기도 하다.
기존의 챗 봇들은 자연어 분석과 베이지안 통계에 기반한 사용자 의도 패턴 인식과 그에 따른 템플릿 응답을 기본 원리로 하여 개발되었다. 그러나 2012년 이후 급속도로 발전한 딥러닝 및 그에 기초한 자연어 인식 기술은 기계 학습을 이용해 챗 봇을 만들 수 있는 가능성을 열었다. 기계학습을 통해 챗 봇을 개발할 경우, 충분한 학습도의 모델을 구축한 후에는 학습 데이터에 따라 컨텍스트에서 벗어난 임의의 문장 입력에 대해서도 적당한 답을 생성할 수 있다는 장점이 있다.
이 발표에서는 Python 3 및 TensorFlow를 이용하여 딥러닝 기반의 챗 봇을 만들 경우에 경험하게 되는 문제점들 및 해결 방법을 다룬다. 봇의 컨텍스트 엔진과 대화 엔진간의 다형성 모델을 구현하고 연결하는 아이디어와 함께 자연어 처리 및 연속 대화 구현, 어법 처리 등을 어떻게 모델링할 수 있는 지에 대한 아이디어 및 구현과 팁을 공유하고자 한다.
Lean Analytics for Intrapreneurs (Lean Startup Conf 2013)Lean Analytics
Lean Analytics for Intrapreneurs workshop by Alistair Croll, based on Lean Analytics book and research done with dozens of large organizations on how they're using data, analytics and Lean principles to innovate and improve.
Microservices architectures are changing the way that organizations build their applications and infrastructure. Companies can now achieve new levels of scale and efficiency by disaggregating their large, monolithic applications into small, independent “micro services”, each of which perform different functions. In this session, we’ll introduce the concept of microservices, help you evaluate whether your organization is ready for microservices, and discuss methods for implementing these architectures.
An Overview of AI on the AWS Platform - February 2017 Online Tech TalksAmazon Web Services
AWS offers a family of intelligent services that provide cloud-native machine learning and deep learning technologies to address your different use cases and needs. For developers looking to add managed AI services to their applications, AWS brings natural language understanding (NLU) and automatic speech recognition (ASR) with Amazon Lex, visual search and image recognition with Amazon Rekognition, text-to-speech (TTS) with Amazon Polly, and developer-focused machine learning with Amazon Machine Learning.
For more in-depth deep learning applications, the AWS Deep Learning AMI lets you run deep learning in the cloud, at any scale. Launch instances of the AMI, pre-installed with open source deep learning engines (Apache MXNet, TensorFlow, Caffe, Theano, Torch and Keras), to train sophisticated, custom AI models, experiment with new algorithms, and learn new deep learning skills and techniques; all backed by auto-scaling clusters of GPU-based instances.
Whether you’re just getting started with AI or you’re a deep learning expert, this session will provide a meaningful overview of how to improve scale and efficiency with the AWS Cloud.
Learning Objectives
• Learn about the breadth of AI services available on the AWS Cloud
• Gain insight into practical use cases for Amazon Lex, Amazon Polly, and Amazon Rekognition
• Understand why Amazon has selected MXNet as its deep learning framework of choice due its programmability, portability, and performance
Exploratory data analysis is the process of quickly looking at data, formulating hypotheses, and testing those hypotheses. In practice, two of the most important components of this process are transforming data and visualizing it. This tutorial will be a hands-on, practical introduction to using R for data exploration, with an emphasis on data transformation and visualization. I will focus on using modern R packages like ggplot2, dplyr, and tidyr for this tutorial.
在這資料科學逐漸成為顯學的年代,無論面對的是資料的幾個 V,其中最重要的永遠都是 Value (價值) 這個 V,而資料探勘正是一種透過系統化的方式釐清資料的脈絡、找出其中有價值的特徵與相關性的技術。這門六小時的課程,將從最實務的角度切入,與大家分享如何將現實中極待解決的問題,轉換成可以利用資料探勘技術處理的問題,並且運用 R 語言中各種強大的工具,進行關聯性分析、迴歸分析以及叢聚分析,以達成將資料中隱藏的資訊挖掘出來的最終目標。
在此課程中將帶領對資料分析感到陌生卻又充滿興趣的您,完整地學會運用 R 語言從最初的蒐集資料、探索性分析解讀資料,並進行文字探勘,發現那些肉眼看不見、隱藏在資料底下的意義。此課程主要設計給對於 R 語言有基本認識,想要進一步熟悉實作分析的朋友們,希望在課程結束後,您能夠更熟悉 R 語言這個豐富的分析工具。透過蘋果日報慈善捐款的資料集,了解如何從頭解析網頁,撰寫爬蟲自動化收集資訊;取得資料後,能夠靈活處理資料,做清洗、整合及探索;並利用現成的套件進行文字探勘、文本解析;我們將一步步實際走一回資料分析的歷程,處理、觀察、解構資料,試著看看人們在捐款的決策過程中,究竟是什麼因素產生了影響,以及這些結果又是如何從資料中挖掘而出的呢?
This sharing is talking about how Trend micro SPN using HBase to solve Graph model problem. And use pageRank to process our graph data to do predictive things. Then we also put the partial impl. of our Graph solution named HGraph on github for everyone interesting about this topic.
Big Data and New Challenges for DBAs (Michael Naumov, LivePerson)
Hadoop has become a popular platform for managing large datasets of structured and unstructured data. It does not replace existing infrastructures, but instead augments them. Most companies will still use relational databases for transactional processing and low-latency queries, but can benefit from Hadoop for reporting, machine learning or ETL. This session will cover:
What is Hadoop and why do I care?
What do people do with Hadoop?
How can SQL Server DBAs add Hadoop to their architecture?
The millions of people that use Spotify each day generate a lot of data, roughly a few terabytes per day. What does it take to handle datasets of that scale, and what can be done with it? I will briefly cover how Spotify uses data to provide a better music listening experience, and to strengthen their busineess. Most of the talk will be spent on our data processing architecture, and how we leverage state of the art data processing and storage tools, such as Hadoop, Cassandra, Kafka, Storm, Hive, and Crunch. Last, I'll present observations and thoughts on innovation in the data processing aka Big Data field.
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
HBase Technical Introduction. This deck includes a description of memory design, write path, read path, some operational tidbits, SQL on HBase (Phoenix and Hive), as well as HOYA (HBase on YARN).
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...Cloudera, Inc.
Mignify is a platform for collecting, storing and analyzing Big Data harvested from the web. It aims at providing an easy access to focused and structured information extracted from Web data flows. It consists of a distributed crawler, a resource-oriented storage based on HDFS and HBase, and an extraction framework that produces filtered, enriched, and aggregated data from large document collections, including the temporal aspect. The whole system is deployed in an innovative hardware architecture comprising of a high number of small (low-consumption) nodes. This talk will tackle the decisions made along the design and development of the platform, both under a technical and functional perspective. It will introduce the cloud infrastructure, the LTE-like ingestion of the crawler output into HBase/HDFS, and the triggering mechanism of analytics based on a declarative filter/extraction specification. The design choices will be illustrated with a pilot application targeting Daily Web Monitoring in the context of a national domain.
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)VMware Tanzu
Recorded at SpringOne2GX 2013 in Santa Clara, CA
Speaker: Adam Shook
This session assumes absolutely no knowledge of Apache Hadoop and will provide a complete introduction to all the major aspects of the Hadoop ecosystem of projects and tools. If you are looking to get up to speed on Hadoop, trying to work out what all the Big Data fuss is about, or just interested in brushing up your understanding of MapReduce, then this is the session for you. We will cover all the basics with detailed discussion about HDFS, MapReduce, YARN (MRv2), and a broad overview of the Hadoop ecosystem including Hive, Pig, HBase, ZooKeeper and more.
Learn More about Spring XD at: http://projects.spring.io/spring-xd
Learn More about Gemfire XD at:
http://www.gopivotal.com/big-data/pivotal-hd
Apache Hive and HBase are very popular projects in the Hadoop ecosystem. Using Hive with HBase was made possible by contributions from Facebook around 2010. In this talk, we will go over the details of how the integration works, and talk about recent improvements. Specifically, we will cover the basic architecture, schema and data type mappings, and recent filter pushdown optimizations. We will also go into detail about the security aspects of Hadoop/HBase related to Hive setups.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation Based on HBase
1. A Graph Service for Global Web
Entities Traversal and Reputation
Evaluation Based on HBase
Chris Huang, Scott Miao
2014/5/5
2. Who are we
• Scott Miao
• Developer, SPN, Trend Micro
• Worked on hadoop ecosystem since 2011
• Expertise in HDFS/MR/HBase
• Contributor for HBase/HDFS
• @takeshi.miao
• Chris Huang
• RD Manager, SPN, Trend Micro
• Hadoop Architect
• Worked on hadoop ecosystem since 2009
• Contributor for Bigtop
• @chenhsiu48
Our blog ‘Dumbo in TW’: http://dumbointaiwan.blogspot.tw/
11. Threat Entities Relation Graph
F
D
I
DD
D
I
I
D
DF
F
D
I
I
I
E
E
E
E
E
E
I
I
ID
D
D
D
F
D
D F
I
I
F
D
E
File
IP
Domain
Email
12. Most Entity Reputations are Unknown
F
D
I
DD
D
I
I
D
DF
F
D
I
I
I
E
E
E
E
E
E
I
I
ID
D
D
D
F
D
D F
I
I
F
D
E
File
IP
Domain
Email
13. Security Solution Dilemma – Long Tail
Prevalence
Entities
Known good/bad
Traditional heuristic detection
Big Datacan help!
How can we detect the rest
effectively?
14. Inspired by PageRank
• Too many un-visited pages!
• Users browse pages through
links
• Let users’ clicks (BIG DATA) tell
us the rankings of those un-
visited pages!
15. Revised PageRank Algorithm
• Too many un-rated threat
entities!
• Malware activities interact with
threat entitles
• Let malware’s behaviors (BIG
DATA) tell us the reputations
of those un-rated threat entities!
F
D
I
DD
D
I
I
D
DF
F
D
II
I
E
E
E
E E
E
I
I
ID
D
D
D
F
D
D F
I
17. The Problems
• Store large size of Graph data
• Access large size of Graph data
• Process large size of Graph data
18. Data volume
• Dump ~450MB (150 bytes * 3,000,000 records) data
into Graph per day
– Extract from 3GB of data
• Keep it for 3 month
– ~450MB * 90 = ~40,500MB = ~39GB
– With Snappy compression
– ~20 - 22GB
• Dataset
– ~40,000,000 vertices and ~100,000,000 edges
• Data query volume about hundreds of thousands per
day
21. Property Graph Model
Name: Jack
Sex: Male
Marriage: true
Name: Merry
Sex: Female
Marriage: true
marries
Name: Vivian
Sex: Female
Marriage: false
Name: John
Sex: Male
Marriage: true
Date: 2010/5/5
Name: Emily
Sex: Female
Marriage: true
From a soap opera…
https://github.com/tinkerpop/blueprints/wiki/Property-Graph-
Model
22. • We use HBase as a Graph Storage
– Google BigTable and PageRank
– HBaseCon2012
• Storing and manipulating graphs in HBase
The winner is…
Massive
scalable ?
Active
community ?
Analyzable ?
23. Use HBase to store Graph data (1/3)
• Tables
– create 'vertex', {NAME => 'property',
BLOOMFILTER => 'ROW', COMPRESSION
=> ‘SNAPPY', TTL => '7776000'}
– create 'edge', {NAME => 'property',
BLOOMFILTER => 'ROW', COMPRESSION
=> ‘SNAPPY', TTL => '7776000'}
24. Use HBase to store Graph data (2/3)
• Schema design
– Table: vertex
– Table: edge
‘<vertex-id>||<entity-type>’, ‘property:<property-key>@<property-value-type>’,
<property-value>
‘<vertex1-row-key>--><label>--><vertex2-row-key>’,
‘property:<property-key>@<property-value-type>’, <property-value>
25. Use HBase to store Graph data (3/3)
• Sample
– Table: vertex
– Table: edge
‘myapps-ups.com||domain’, ‘property:ip@String’, ‘…’
‘myapps-ups.com||domain’, ‘property:asn@String’, ‘…’
…
‘track.muapps-ups.com/InvoiceA1423AC.JPG.exe||url’, ‘property:path@String’, ‘…’
‘track.muapps-ups.com/InvoiceA1423AC.JPG.exe||url’, ‘property:parameter@String’, ‘…’
‘myapps-ups.com||domain-->host-->track.muapps-ups.com/InvoiceA1423AC.JPG.exe||url’,
‘property:property1’, ‘…’
‘myapps-ups.com||domain-->host-->track.muapps-ups.com/InvoiceA1423AC.JPG.exe||url’,
‘property:property2’, ‘…’
26. Keep your rowkey length short
• With long rowkey length
– It does not impact your query performance
– But it does impact your algorithm MR
• OutOfMemoryException
• Use something like HASH function to keep
your rowkey length short
– Use the hash value as rowkey
– Put the original value into a property
29. Preprocess and Dump Data
• HBase schema design is simple and human-
readable
• It is easy to write your dumping tool if needed
– MR/Pig/Completebulkload
– Can write cron-job to clean up the broken-edge
data
– TTL can also help to retire old data
• We already have a lot practices for these
tasks
32. Get Data (1/2)
• A Graph API
• A better semantic for manipulating Graph data
– As a wrapper for HBase Client API
– Rather than use HBase Client API directly
• A malware exploring sample
Vertex vertex = this.graph.getVertex(“malware");
Vertex subVertex = null;
Iterable<Edge> edges =
vertex.getEdges(Direction.OUT, “connect", “infect", “trigger");
for(Edge edge : edges) {
subVertex = edge.getVertex(Direction.OUT);
...
}
33. Get Data (2/2)
• We implement blueprints API
– It provides interfaces as spec. for users to impl.
• 824 stars, 173 forks on github
– We can get more benefits from it
• plug-and-play different Blueprints-enabled graph
backends
– Traversal language, RESTful server, dataflow, etc
– http://www.tinkerpop.com/
– Currently basic query methods are implemented
34. Clients
• Real time Client
– Client systems
• they need associated Graph data for a specific entity via RESTful API
– Usually retrieve two levels of graph data
– Quick responsiveness supported by HBase
• With rowkey random access and appropriate schema design
• HTable.get(),Scan.setStartRow(), Scan.setStopRow()
• Batch client
– Threat experts
– Pick one entity and how many levels interested in, generate a graph file
format used by tools
• To visualize and navigate what whether users interested in
• Graph Exploring Tools
– Threat experts
– Find out sub-graphs by given criteria
• E.g. How many levels or associated vertices
37. Malware Exploring Performance (3/3)
• Some statistics
– Mean: 51.61 ms
– Standard Deviation: 653.57 ms
– Empirical rule: 68%, 95%, 99.7%
• 99.7% of requests below 2.1 seconds
• But response time variances still happen
– Use Cache layer between client and HBase
– Warm-up after new data come in
40. • Human-readable HBase schema design
– Write your own MR
– Write your own Pig/UDFs
• So we can write the algorithms to further
process our graph data
– To predict unknown reputation by known threats
– E.g. a revised PageRank algorithm
41. Data process flow
Src
table
snapshot
Clone
table Data on HDFS
Algorithms
(MR, Pig UDF)
Clone
table
Processed completed
Graph
table
snapshot
Clone
table
Clone
table
Intermediate data on HDFS
HBase
1. Dump daily
data
2. Take
snapshot
3. Clone
snapshot
4. Process data
iteratively
(takes hours)
4.1 generate
Intermediate
data
5. Process
complete 6. Dump
processed data
with timerange
42. A customized TableInputFormat (1/2)
• One Mapper for one region by default
– Each Mapper process too much data
• OutOfMemoryException
• Too long to process
– Use small split region size ?
• Will overload your HBase cluster !!
• Before: about ~40 Mappers
• After: about ~500 Mappers
43. A customized TableInputFormat (2/2)
Clone
Graph
Table
MR - Pick
Candidates
Combination
Candidates
list file
<encodedRegionName>t<startKey>t<endKey>
…
d3d1749f3486e850b263c7ecb2424dd3tstartKey_1tendKey_1
d3d1749f3486e850b263c7ecb2424dd3tstartKey_2tendKey_2
d3d1749f3486e850b263c7ecb2424dd3tstartKey_3tendKey_3
Cd91c08d656a19bdb180e0b7f8896575tstartKey_4tendKey_4
Cd91c08d656a19bdb180e0b7f8896575tstartKey_5tendKey_5
…
CustTableInp
utFormat
MR - algorithm
2. Scan table3. Output candidates
4. Load candidates
5. Run Algo. with
more Mappers
1. Run MR
44. HGraph
• A project is open and put on github
– https://github.com/trendmicro/HGraph
• A partial impl. released from our internal project
– Follow HBase schema design
– Read data via Blueprints API
– Process data with our pagerank default impl.
• Download or ‘git clone’ it
– Use ‘mvn clean package’
– Run on unix-like OS
• Use windows may encounter some errors
46. Experiment Result
• Testing Dataset
– 42,133,610 vertices and 108,355,774 edges
– 1 vertex usually associates 2 ~ 9 vertices
– 4.13% of the vertices are known bad
– 0.09% of the vertices are known good
– The rests are unknown
• Result
– Runs 34hrs for running 23 iterations.
– 1,291 unknown vertices are ranked out
– Top 200 has 99% accuracy (explain later)
48. Untested But Highly Malware Related IP
• 67.*.*.132
– Categorized as “Computers / Internet”, not tested
https://www.virustotal.com/en/ip-address/67.*.*.132/information/
53. Property Graph Model Definition
• A property graph has these elements
– a set of vertices
• each vertex has a unique identifier.
• each vertex has a set of outgoing edges.
• each vertex has a set of incoming edges.
• each vertex has a collection of properties defined by a map from key to value.
– a set of edges
• each edge has a unique identifier.
• each edge has an outgoing tail vertex.
• each edge has an incoming head vertex.
• each edge has a label that denotes the type of relationship between its two vertices.
• each edge has a collection of properties defined by a map from key to value.
54. About regions
• Keep reasonable amount of regions for each
regionserver
• Notice your splitted regions from one table
– Dump data daily, cause regions splitting
– Make sure your regions scattered evenly on each
regionserver
<hbase.regionserver.global.memstore.upperLimit> / <hbase.hregion.memstore.flush.size> =
<active-regions-per-rs>
e.g. (10G * 0.4) / 128MB = 32 active regions HBase Sizing Notes by Lars George