Big Data Connection presents: Big Data: Cause of ConfusionBob Samuels
High level view of the confusing world of 'Big Data'. The mission of the non-profit, American Institute of Big Data Professionals (AIBDP) is to provide structure and standards around terminology, proficiency, methodology, and expectations around Big Data.
Reinventing the Modern Information Pipeline: Paxata and MapRLilia Gutnik
(Presented at MapR's Big Data Everywhere event in Redwood City, CA in December 2016)
The relationship between business teams and IT has changed as the complexity of data has increased. A traditional data pipeline designed for an IT-centered approach to information management is not designed for the data demands of today's business decisions. Designing a big data strategy requires modernizing previous approaches. Self-service data preparation in a collaborative, intuitive, governed, and secure environment is the key to a nimble and decisive business unit.
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...ETCenter
Next generation applications address more sophisticated questions that go beyond 'What happened?' by using Machine Learning/Statistical modelling to answer 'Why?' and 'What will happen next? Data insights can be easily deployed and rapidly delivered to the decision makers via cloud based applications. This framework focuses on technologies available for the entire data workflow from ingestion and modeling to cloud deployment; Hadoop, MADlib, Python, R, CloudFoundry, etc. This presentation will also include examples of how this framework and innovative Data Science techniques have been applied across diverse business units within Media, including pricing analyses for ad optimization and predicting viewership.
As anyone who has ever had need to understand the data model which underpins SAP or SAP BW to assist in a project will tell you... it is often a difficult, time consuming and costly exercise and not easy to ensure accuracy. In this webinar we explore the problem, discuss how Boeing, RS Components and Hydro Tasmania have used Safyr from Silwood Technology to meet this challenge and at the same time reduce the time, cost and risk associated with it.
When the IT department of a large US oil and gas company was tasked with improving the way in which vast amounts of data were analysed, manipulated and disseminated, it investigated a number of tools that would enable users to explore, document and visualise data structures for its large SAP(r) enterprise application, before deciding to implement Safyr.
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
A modern, flexible approach to Hadoop implementation incorporating innovations from HP Haven
Jeff Veis
Vice President
HP Software Big Data
Gilles Noisette
Master Solution Architect
HP EMEA Big Data CoE
DesignMind is a technology consulting firm that develops Database, Business Intelligence, and Big Data solutions in San Francisco, Silicon Valley, and throughout the U.S.
Big Data Connection presents: Big Data: Cause of ConfusionBob Samuels
High level view of the confusing world of 'Big Data'. The mission of the non-profit, American Institute of Big Data Professionals (AIBDP) is to provide structure and standards around terminology, proficiency, methodology, and expectations around Big Data.
Reinventing the Modern Information Pipeline: Paxata and MapRLilia Gutnik
(Presented at MapR's Big Data Everywhere event in Redwood City, CA in December 2016)
The relationship between business teams and IT has changed as the complexity of data has increased. A traditional data pipeline designed for an IT-centered approach to information management is not designed for the data demands of today's business decisions. Designing a big data strategy requires modernizing previous approaches. Self-service data preparation in a collaborative, intuitive, governed, and secure environment is the key to a nimble and decisive business unit.
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...ETCenter
Next generation applications address more sophisticated questions that go beyond 'What happened?' by using Machine Learning/Statistical modelling to answer 'Why?' and 'What will happen next? Data insights can be easily deployed and rapidly delivered to the decision makers via cloud based applications. This framework focuses on technologies available for the entire data workflow from ingestion and modeling to cloud deployment; Hadoop, MADlib, Python, R, CloudFoundry, etc. This presentation will also include examples of how this framework and innovative Data Science techniques have been applied across diverse business units within Media, including pricing analyses for ad optimization and predicting viewership.
As anyone who has ever had need to understand the data model which underpins SAP or SAP BW to assist in a project will tell you... it is often a difficult, time consuming and costly exercise and not easy to ensure accuracy. In this webinar we explore the problem, discuss how Boeing, RS Components and Hydro Tasmania have used Safyr from Silwood Technology to meet this challenge and at the same time reduce the time, cost and risk associated with it.
When the IT department of a large US oil and gas company was tasked with improving the way in which vast amounts of data were analysed, manipulated and disseminated, it investigated a number of tools that would enable users to explore, document and visualise data structures for its large SAP(r) enterprise application, before deciding to implement Safyr.
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
A modern, flexible approach to Hadoop implementation incorporating innovations from HP Haven
Jeff Veis
Vice President
HP Software Big Data
Gilles Noisette
Master Solution Architect
HP EMEA Big Data CoE
DesignMind is a technology consulting firm that develops Database, Business Intelligence, and Big Data solutions in San Francisco, Silicon Valley, and throughout the U.S.
This was presented at SAS Visual Analytics Event on May 15, 2013 in Chennai. This presentation discussed on how SAS Visual Analytics can empower your organisation in gaining valuable insights from your data in the shortest amount of time.
UCSD: Building a Big Data Culture - It Takes a VillagePaul Barsch
Companies talk about the need to make decisions based on analytics, but there are people, process, technology, and strategy considerations to making it work. This presentation given at UCSD in May 2017 discusses the journey companies take towards becoming "data-driven", including where they most often get stuck. Also discussed are various roles required (i.e. data scientist, data engineer, data analysts and more) and the skills needed to succeed now and in the future. This presentation will show you how to stay relevant in an age of disruption by leveraging data to make the best decisions possible.
zData Inc. Big Data Consulting and Services - Overview and SummaryzData Inc.
This slide deck is a summary of zData Inc., a leading Big Data Consulting and Services Provider. zData focuses on commercial and enterprise corporations, employing experts in all areas of the field from software engineers to data scientists. They work with top hardware and software providers for on-site and off-site consulting, managed services, trainings, and long term scalable data solutions.
[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...Kaan Onuk
Discover how Uber thinks about building big data knowledge platforms to allow teams to discover, manage, and govern entities. Explore how to build an extensible metadata management platform and infrastructure to democratize data at Uber's scale
As users gain more experience with Hadoop, they are building on their early success and expanding the size and scope of Hadoop projects. Syncsort’s third annual Hadoop Market Adoption Survey reflects the fact that Hadoop is no longer considered a technology for the future as it was when we first started conducting this research.
Get an in-depth look at the survey results and five trends to watch for in 2017. You’ll also learn:
• The best uses for Hadoop in 2017 – real-word examples of how Enterprises are realizing the value of Big Data
• Solutions to help you address the challenges enterprises still face in employing Hadoop
• What the future of Hadoop means for your business
AzureDay - Introduction Big Data Analytics.Łukasz Grala
AzureDay North 2016. Conference about cloud solutions.
What is Analytics? What is Big Data? Why Big Data we have in the cloud. What offer Microsoft for Big Data Analytics. How to start with Big Data Analytics or Advanced Analytics? Session introduce fundamentals for Big Data and Advanced Analytics.
By Data Scientist as a Service
Metadata discovery for enterprise packages - a better approachRoland Bullivant
Safyr is a unique solution for helping companies accelerate and improve the quality of information management projects which involve packages from SAP, Oracle and Salesforce. Safyr does this by making their metadata available and understandable in a fraction of the time and cost it takes using traditional methods.
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Data Con LA
In this interactive panel discussion, you will hear from these Spark experts as to why they chose to go "all-in" on Spark, leveraging the rich core capabilities that make Spark so exciting, and committing to significant IP that turns Spark into a world-class enterprise data preparation engine.
Raymond and David will explain specific cases where capabilities were built on top of core Spark to provide a true interactive data prep application experience. Innovations such as creating a Domain Specific Language (DSL), an optimizing compiler, a persistent columnar caching layer, application specific Resilient Distributed Datasets (RDDs), on-line aggregation operators to solve the core memory, pipelining and shuffling obstacles to produce a highly interactive application with the core user and data volume scale-out benefits of Spark.
Benchmarking Digital Readiness: Moving at the Speed of the MarketApigee | Google Cloud
Moving at the new speed of the market: benchmarking your digital readiness with real-world data
Companies are under pressure to move at the speed of digital natives. Benchmark your organization against empirical data and real-world case studies to see where you stand and what you can do to jumpstart your digital readiness.
You can watch the replay for this Geek Sync webcast in the IDERA Resource Center: http://ow.ly/sgto50A5d9J
Movement to the cloud is in full swing. Whether your company is considering Azure, Amazon Web Services, the Google Cloud, or any of the other providers, you will need a cloud strategy. However, with the cloud comes a different way of thinking. The good news is, you have a lot of options. The reality is, migrating to the cloud is not as simple as creating servers in the service and charging forward.
Join IDERA and Mike Fal as he discusses the different options for cloud migration, that include Platform-as-a-Service, hybrid cloud footprints, disaster recovery, and many other facets of using the cloud. Attendees will get a high level overview of what it means to move to the cloud and what you should consider as you navigate your own cloud migration strategy.
About Mike: Mike Fal(@mike_fal) is a specialist in data management technologies. As a community advocate, public speaker, and blogger, Mike is a practicing thought leader for data and automation. He is passionate about DevOps and data, building platforms to optimize, protect, and use data efficiently. Since 1999, Mike has been working in the database field, focusing primarily on SQL Server and specializes in automating data solutions to improve the reliability and efficiency of his environments. He currently works as a SQL Server Consultant for UpSearch, LLC and has been caught playing trombone in public on more than one occasion.
This was presented at SAS Visual Analytics Event on May 15, 2013 in Chennai. This presentation discussed on how SAS Visual Analytics can empower your organisation in gaining valuable insights from your data in the shortest amount of time.
UCSD: Building a Big Data Culture - It Takes a VillagePaul Barsch
Companies talk about the need to make decisions based on analytics, but there are people, process, technology, and strategy considerations to making it work. This presentation given at UCSD in May 2017 discusses the journey companies take towards becoming "data-driven", including where they most often get stuck. Also discussed are various roles required (i.e. data scientist, data engineer, data analysts and more) and the skills needed to succeed now and in the future. This presentation will show you how to stay relevant in an age of disruption by leveraging data to make the best decisions possible.
zData Inc. Big Data Consulting and Services - Overview and SummaryzData Inc.
This slide deck is a summary of zData Inc., a leading Big Data Consulting and Services Provider. zData focuses on commercial and enterprise corporations, employing experts in all areas of the field from software engineers to data scientists. They work with top hardware and software providers for on-site and off-site consulting, managed services, trainings, and long term scalable data solutions.
[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...Kaan Onuk
Discover how Uber thinks about building big data knowledge platforms to allow teams to discover, manage, and govern entities. Explore how to build an extensible metadata management platform and infrastructure to democratize data at Uber's scale
As users gain more experience with Hadoop, they are building on their early success and expanding the size and scope of Hadoop projects. Syncsort’s third annual Hadoop Market Adoption Survey reflects the fact that Hadoop is no longer considered a technology for the future as it was when we first started conducting this research.
Get an in-depth look at the survey results and five trends to watch for in 2017. You’ll also learn:
• The best uses for Hadoop in 2017 – real-word examples of how Enterprises are realizing the value of Big Data
• Solutions to help you address the challenges enterprises still face in employing Hadoop
• What the future of Hadoop means for your business
AzureDay - Introduction Big Data Analytics.Łukasz Grala
AzureDay North 2016. Conference about cloud solutions.
What is Analytics? What is Big Data? Why Big Data we have in the cloud. What offer Microsoft for Big Data Analytics. How to start with Big Data Analytics or Advanced Analytics? Session introduce fundamentals for Big Data and Advanced Analytics.
By Data Scientist as a Service
Metadata discovery for enterprise packages - a better approachRoland Bullivant
Safyr is a unique solution for helping companies accelerate and improve the quality of information management projects which involve packages from SAP, Oracle and Salesforce. Safyr does this by making their metadata available and understandable in a fraction of the time and cost it takes using traditional methods.
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Data Con LA
In this interactive panel discussion, you will hear from these Spark experts as to why they chose to go "all-in" on Spark, leveraging the rich core capabilities that make Spark so exciting, and committing to significant IP that turns Spark into a world-class enterprise data preparation engine.
Raymond and David will explain specific cases where capabilities were built on top of core Spark to provide a true interactive data prep application experience. Innovations such as creating a Domain Specific Language (DSL), an optimizing compiler, a persistent columnar caching layer, application specific Resilient Distributed Datasets (RDDs), on-line aggregation operators to solve the core memory, pipelining and shuffling obstacles to produce a highly interactive application with the core user and data volume scale-out benefits of Spark.
Benchmarking Digital Readiness: Moving at the Speed of the MarketApigee | Google Cloud
Moving at the new speed of the market: benchmarking your digital readiness with real-world data
Companies are under pressure to move at the speed of digital natives. Benchmark your organization against empirical data and real-world case studies to see where you stand and what you can do to jumpstart your digital readiness.
You can watch the replay for this Geek Sync webcast in the IDERA Resource Center: http://ow.ly/sgto50A5d9J
Movement to the cloud is in full swing. Whether your company is considering Azure, Amazon Web Services, the Google Cloud, or any of the other providers, you will need a cloud strategy. However, with the cloud comes a different way of thinking. The good news is, you have a lot of options. The reality is, migrating to the cloud is not as simple as creating servers in the service and charging forward.
Join IDERA and Mike Fal as he discusses the different options for cloud migration, that include Platform-as-a-Service, hybrid cloud footprints, disaster recovery, and many other facets of using the cloud. Attendees will get a high level overview of what it means to move to the cloud and what you should consider as you navigate your own cloud migration strategy.
About Mike: Mike Fal(@mike_fal) is a specialist in data management technologies. As a community advocate, public speaker, and blogger, Mike is a practicing thought leader for data and automation. He is passionate about DevOps and data, building platforms to optimize, protect, and use data efficiently. Since 1999, Mike has been working in the database field, focusing primarily on SQL Server and specializes in automating data solutions to improve the reliability and efficiency of his environments. He currently works as a SQL Server Consultant for UpSearch, LLC and has been caught playing trombone in public on more than one occasion.
亞洲 Hadoop 產品與解決方案引領者 Etu,於年度 Etu Solution Day (ESD) 活動中發表「2014 年台灣 Big Data 市場 5 大趨勢預測」。Etu 也首度發表兩岸的 10 大行業、21 種 Hadoop Big Data 已經被驗證的應用,如電信業的經營分析與客服查詢、電子商務的精準推薦、數位媒體的內容推薦、零售行業的使用者行為分析、高科技製造的資料倉儲工作分流卸載與製程良率分析、政府與地產的輿情分析、電力的能源管理、保險的巨量小圖檔管理等。預期 2014 年的台灣 Big Data 市場將更為成熟,經過驗證階段後,進入最後導入階段的企業也可望有倍數的成長。
Etu 負責人蔣居裕表示:「UDN 的採用,說明了台灣企業導入 Big Data 應用的需求在特定產業力道明顯上揚,『2014 年台灣 Big Data 市場的 5 大趨勢預測』也呼應了這樣的看法。」蔣居裕說:「一、首先過河的人,要開始挑戰資料價值的海洋,越早期投入者,越用越深,越深越廣;二、Total Data BI 帶動企業採用多結構化資料倉儲。客戶行為分析、精準行銷、客戶體驗是應用目標;三、從新舊系統整合到 End-to-End 解決方案,大部分企業期待廠商能夠完整交付 Big Data 應用與專業技術顧問。『容易』(Ease) 是 Big Data 產品進入企業的關鍵字;四、資料探索工具當道,力助 Business User 比 IT 人員更能挖掘 Big Data 的價值。『探索』(Discovery) 是 Big Data 分析的神髓所在 —— 探索關聯、探索意圖、探索缺少什麼;五、Big Data 教育訓練課程,從以處理技術為主者,快速擴展到資料分析。但均會被含括在『資料科學』大傘下。資料科學家萬中選一,強調專業分工的資料科學團隊,才是實踐資料價值希望之所在。」
ESD 2013 另外還展現了藉由 Etu Appliance 所架構起來的 Etu Ecosystem,展示了由 Etu 以及 ISV 夥伴們所開發的 End-to-End 解決方案:Etu Recommender,除了原有的個人化精準推薦,現在還可與第三方工具整合,進行資料視覺化探索,建置使用者行為分析資料倉儲;合作夥伴堂朝數位整合的雲端電子刊物加值平台、PilotTV 前線媒體的收視量測系統、樺鼎商業資訊的視覺化分析工具、以及衛信科技的 SDN 網路管理完整解決方案,則分別透過 Etu Appliance 來做巨量、可擴展的檔案格式轉換運算、臉部辨識資料及時處理與分析、多結構化資料倉儲、網路資料封包預處理等工作。這些方案的共同點,就是它們都是基於不斷獲得各種產品創新獎項的 Etu Appliance 所開發或整合的應用。
Summary of Insights Learned from the Data Science Program Team TrainingFred Chiang
Who really has the skills and talents to leverage the most value out of data? The Data Science Program (DSP) was co-founded by Code for Tomorrow and Etu. We believe that building and deploying a data science team consisting of members who possess and have the ability to utilize their different skill sets from a variety of industries is more practical and realistic. Versus hoping to find an individual data scientist who is an expert in a wide variety of technical fields ranging from math, statistics, and visualizations, as well as a solid background in other fields such as business, communication, and etc. The Data Science Program has identified four pertinent categories to place our members into. These four categories are Campaigner, Data Analyst, Data Hygienist, and Designer. Each team will have these four categories filled. During the training every team learns how to do data processing, data analysis, and visualization together with the sole purpose to use these skills to solve a common problem. After four weeks of intensive study, each team comes up with enterprise-grade team projects demonstrating the innovation of data-driven businesses.
After two rounds of DSP Team Training, DSP has accumulated 10 team projects and has graduated more than 60 alumni who are passionate about data science. During this journey of developing and deploying teams trained in data science, the most valuable aspects we walked away with was the witnessing of members growing in confidence from the learning and experience, the building of team work, and the overall growth of each individual. At the end of the day, our hope of as members of DSP, including myself is to instill and motivate more people to devote themselves to the exploration of data science. Now think about how you can do the same.
Opening Keynote for HadoopCon 2014
我們的身邊、網路上,圍繞著太多的 Big Data 論述與技術,Hadooper 今天聚集在這裡,都已經是 Big Data 的相關利益者,然而, 今天我們所理解的 Big Data,大部分都是透過自身的體驗而來,但 Hadoop Ecosystem 太過龐雜,Use Case 不同,必須取不同的 OSS 專案來完成,如此想來,我們哪一個人何曾看過所有的 Big Data 風景呢?
此 Talk 告訴我們如何透過更多的風景之窗,將 Big Data 的不同天地,看得更多更透。
There are patterns for things such as domain-driven design, enterprise architectures, continuous delivery, microservices, and many others.
But where are the data science and data engineering patterns?
Sometimes, data engineering reminds me of cowboy coding - many workarounds, immature technologies and lack of market best practices.
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
This webinar discusses why Apache Hadoop most typically the technology underpinning "Big Data". How it fits in a modern data architecture and the current landscape of databases and data warehouses that are already in use.
A modern data platform meets the needs of each type of data in your businessMarcos Quezada
Durante algo más de 20 años nuestros clientes han construído con confianza las bases de datos de sus aplicaciones críticas para el negocio sobre bases de datos comerciales robustas como Oracle y DB2 sobre Power Systems. A medida que la transformación digital de sus empresas evoluciona, impulsada por la migración hacia plataformas móbiles y web, se ven enfrentados a la necesidad de extraer más valor de su bien más preciado: sus datos.
Muchas empresas ahora necesitan comenzar a explorar y explotar otros tipos de datos y otros volúmenes de datos, para ellos Cognitive Systems presenta soluciones para una plataforma moderna de datos basados en bases de datos de clave y valor, documentales, de grafos, de fuente abierta y paralelas como Hadoop.
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
If you’ve spent time investigating Big Data, you quickly realize that the issues surrounding Big Data are often complex to analyze and solve. The sheer volume, velocity and variety changes the way we think about data – including how enterprises approach data architecture.
Significant reduction in costs for processing, managing, and storing data, combined with the need for business agility and analytics, requires CIOs and enterprise architects to rethink their enterprise data architecture and develop a next-generation approach to solve the complexities of Big Data.
Creating the data architecture while integrating Big Data into the heart of the enterprise data architecture is a challenge. This webinar covered:
-Why Big Data capabilities must be strategically integrated into an enterprise’s data architecture
-How a next-generation architecture can be conceptualized
-The key components to a robust next generation architecture
-How to incrementally transition to a next generation data architecture
Extract business value by analyzing large volumes of multi-structured data from various sources such as databases, websites, blogs, social media, smart sensors...
Architecting Agile Data Applications for ScaleDatabricks
Data analytics and reporting platforms historically have been rigid, monolithic, hard to change and have limited ability to scale up or scale down. I can’t tell you how many times I have heard a business user ask for something as simple as an additional column in a report and IT says it will take 6 months to add that column because it doesn’t exist in the datawarehouse. As a former DBA, I can tell you the countless hours I have spent “tuning” SQL queries to hit pre-established SLAs. This talk will talk about how to architect modern data and analytics platforms in the cloud to support agility and scalability. We will include topics like end to end data pipeline flow, data mesh and data catalogs, live data and streaming, performing advanced analytics, applying agile software development practices like CI/CD and testability to data applications and finally taking advantage of the cloud for infinite scalability both up and down.
SAP Data Hub e SUSE Container as a Service PlatformSUSE Italy
SAP Data Hub è una soluzione di integrazione, orchestrazione governance di dati di qualsiasi tipo, varietà e volume, che utilizza Kubernetes come piattaforma, ed è certificato su SUSE CaaS Platform
In questa sessione SAP e SUSE presentano una panoramica delle principali funzionalità e dei vantaggi dell’integrazione delle due soluzioni. (Nicola Bertini, SAP Italia e SUSE)
We recently presented our technology solution for metadata discovery to the Boulder Business Intelligence Brains Trust in Colorado. (www.bbbt.us)
The whole session was also videod and there is a link to the recording at the end of the presentation.
25 plus years of seasoned data professional in building, managing practices, Global Delivery in Big Data Analytics, Big Data Migration from On-premise to GCP and Azure, EDW & BI, Business analytics, SAP HANA, Predictive Analytics, Data QA, Automation of solutions, Big Data Framework & Methodologies, and Data Products Development
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
This session will detail best practices for architecting, building, operating and managing an Analytics Data Lake platform. Key topics will include:
1) Defining next-generation Data Lake architectures. The defacto standard has been commodity DAS servers with HDFS, but there are now multiple solutions aimed at separating compute and storage, virtualizing or containerizing Hadoop applications, and utilizing Hadoop compatible or embedded HDFS filesystems. This portion will explore the options available, and the pros and cons of each.
2) Data Ingest. There are many ways to load data into a Data Lake, including standardized Apache tools (Sqoop, Flume, Kafka, Storm, Spark, NiFi), standard file and object protocols (SFTP, NFS, Rest, WebHDFS), and proprietary tools (eg, Zaloni Bedrock, DataTorrent). This section will explore these options in the context of best fit to workflows; it will also look at key gaps and challenges, particularly in the areas of data formats and integration with metadata/cataloging tools.
3) Metadata & Cataloguing. One of the biggest inhibitors of successful Data Lake deployments is Data Governance, particularly in the areas of indexing, cataloguing and metadata management. It is nearly impossible to run analytics on top of a Data Lake and get meaningful & timely results without solving these problems. This portion will explore both emerging open standards (Apache Atlas, HCatalog) and proprietary tools (Cloudera Navigator, Zaloni Bedrock/Mica, Informatica Metadata Manager), and balance the pros, cons and gaps of each.
4) Security & Access Controls. Solving these challenges are key for adoption in regulatory driven industries like Healthcare & Financial Services. There are multiple Apache projects and proprietary tools to address this, but the challenge is making security and access controls consistent across the entire application and infrastructure stack, and over the data lifecycle, and being able to audit this in the face of legal challenges. This portion will explore available options and best practices.
5) Provisioning & Workflow Management. The real promise of the Data Lake is integrating Analytics workflows and tools on converged infrastructure-with shared data-and build “As A Service” oriented architectures that are oriented towards self-service data exploration and Analytics for end users. This is an emerging and immature area, but this session will explore some potential concepts, tools and options to achieve this.
This will be a moderately technical session, with the above topics being illustrated by real world examples. Attendees should have basic familiarity with Hadoop and the associated Apache projects.
Accelerate Your Big Data Analytics Efforts with SAS and HadoopDataWorks Summit
Analytics and machine learning continue to be the top use cases for deploying big data platforms such as Hadoop. SAS recognised the potential and power of Hadoop platform early on and has been integrating analytical solutions with Hadoop to leverage the power and flexibility of Hadoop for analytical workloads. The combination of SAS and Hadoop offers developers and organisations an approach that can accelerate the development and deployment of big data analytics applications that are mature, proven and scalable. Furthermore, by giving developers and analysts analytical applications that are rich, proven and collaborative, SAS allows more users across different skill levels to unleash the value of data stored in big data platform more easily and quickly.
In this session, we will cover common big data analytics use cases, the depth and breadth of SAS analytical capabilities on Hadoop, and how SAS solutions are integrated into the Hadoop ecosystem via technologies such as Hive, YARN and Spark.
Speaker
Felix Liao, SAS Institute Australia & New Zealand
Demystify big data data science
An overview of the shift to Data Science Platforms
The 3 critical components of a Data Science platform
Industries that are most likely to get disrupted and shift to Data Science
Characteristics of firms that get left behind the Data Science wave
Factors that push an industry towards Data Science
A brief overview of aspects of platform architecture beyond technology
The Common BI/Big Data Challenges and Solutions presented by seasoned experts, Andriy Zabavskyy (BI Architect) and Serhiy Haziyev (Director of Software Architecture).
This was a complimentary workshop where attendees had the opportunity to learn, network and share knowledge during the lunch and education session.
Similar to Big data connection overview by aibdp.org (20)
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
JMeter webinar - integration with InfluxDB and Grafana
Big data connection overview by aibdp.org
1. Welcome
• Thank you: Francis - Silicon Valley Strategy,
Innovation and Product Management group
• Thank you: Michael & Sam and the Microsoft
Store
• Thank you: Aleks & David & SAP HANA
• Thank you: All of You… You are the ‘Secret
Sauce’
2. Agenda
• Quick Poll
• Overview – AIBDP / Big Data Connection
• Prasad Mavuduri – Board Member, AIBDP –
“Demystifying Big Data”
• David Sonnenschein – Vice President & Aleks
Swerdlow Community Manager – SAP Labs -
HANA In-Memory – Start-ups Success Stories”
• Networking & Q&A
3. Quick Poll
• Relationship & Experience w/ Big Data
• Job Role
• Industry
• Company Years - Start-up?
• Big Data Implementation Status
• Biggest Challenges / Opportunities
• Vs Cometitors?
4. Overview - Big Data Connections
Mission: Demystify Big Data
– Five E’s – entertain, engage, educate etc
– Focus on Solutions (vs technology)
– Focus on Specific Verticals
• ex Healthcare, Risk, eCom/eMarketing,
Manufacturing, Logistics, Telecom…)
– Best Practices Case Study Reviews
– Networking & Shared Learning
– Sponsored by the American Institute of
Big Data Professionals (AIBDP.org)
– Sponsored by Big Data consulting firm,
Data-Magnum
5. BI Platform / Reporting
OSS
Visualizations
Unstructured/ Search
Indexing / Metadata
Search
NLP
Hadoop Analytics
Hadoop Dev Platforms / Automation
HDFS
Predictive Analytics
THE CONFUSING WORLD OF BIG DATAAPPLICATIONSTOOLSDATAMANAGEMENT
STRUCTURED UNSTRUCTURED
Transactional
DB
OSS
High Performance
Analytical DB
NewSQL
Enhancement
Distributed
NoSQL
Graph Document
Key Value /
Column
Enterprise
Apps
Internet
Apps
Social Media Web Content Mobile Devices Camera / DVR Sensors / RFID Logfiles
Hadoop
aaS
HDFS Alternatives
DBaaS
HANA
GraphDB
Filesystem
EMR
Text / Sentiment Analysis
Data as a Service
Data
Warehouses
vFabric L
Drill
Vertical Market Applications
Impala
Messaging Optimization Data Integration / CEP
OSS
IMDG
Redshift
Based on Source: Perella Weinberg Partners
AI
17. It can be made more complicated…
o Hadoop
o NoSQL
o NewSQL
o Structured Databases
o NGDW (next generation data warehouse)
o Cloud Services
o Technical Services
o Professional Services
o Distributors
o Deployment services
o Deployment stack/appliances
o Development services
o Application stacks
o Database stacks
o Managed Monitoring
o Storage
o Security
Source: sqrll:To simplify the NoSQL world, lets take a look at the top 3 databases in terms of current popularity and how they compare to Apache Accumulo, which is at the core of our product, Sqrrl Enterprise.MongoDB: It is a wonderfully easy-to-use document store that many select as a flexible replacement for a SQL database, as it (like all NoSQL databases) does not require pre-defined schemas. However, MongoDB has difficulty scaling to very large datasets (e.g., 100+ TB) and does not natively work with your Hadoop cluster. It also does not possess fine-grained security controls.Cassandra: This is an excellent choice if your data is too big for MongoDB and you require multi-datacenter replication. Although Cassandra was not originally designed to run natively on your Hadoop cluster, it now has integrations with MapReduce, Pig, and Hive. It does not possess fine-grained security controls.HBase: HBase natively integrates with Hadoop, and it can handle very large datasets. However, it does not have fine-grained security controls. Accumulo: Accumulo has an architecture most similar to HBase, which allows it also to natively plug into your Hadoop cluster. It is far more scalable than MongoDB, and with reported cluster sizes in the multiple thousands within the Intelligence Community it is also significantly more scalable than HBase and Cassandra. Accumulo is the only NoSQL database with cell-level security capabilities. Accumulo also has other features that could lead one to choose it over HBase or Cassandra for reasons other than security or scalability. For example, Accumulo has a powerful server-side programming mechanism called Iterators, which provide it with the capability to do a variety of real-time aggregations and analytics.These high level differences between MongoDB, Cassandra, HBase, and Accumulo are summarized in the decision tree diagram below. Of course, there are a wide variety of more detailed technical differences that will be explored in greater detail in a later post. This decision tree can be summarized with a few simple statements:If you need a quick, simple solution and have “small” Big Data (e.g., a few dozen terabytes), MongoDB may be the answer.If you need cell-level security or multi-petabyte scalability, Accumulo is the right answer.If you have data that is too big for MongoDB and don’t need cell-level security or massive scalability, we would recommend testing HBase, Cassandra, and Accumulo for your specific workloads. Each has their own nuanced advantages and disadvantages.If you don’t need real-time analytics, you are probably on the wrong decision tree and can stick with the Hadoop Distributed File System and batch analytics. It is worth noting that the NoSQL databases above are all open source databases. Sqrrl Enterprise builds upon Accumulo and adds a number of additional features to Accumulo including streaming ingest, JSON, encryption, identity management integrations, full-text search, SQL queries, graph search, and statistics. We believe that these features set Sqrrl Enterprise apart from other Big Data platforms.
http://www.capgemini.com/blog/capping-it-off/2012/09/big-data-vendors-and-technologies-the-listBig Data Vendors and TechnologiesData Acquisition stream - technological providers Ab InitioHPIBM (Datastage, Streams, Data mirror)Informatica (PowerCenter, PowerExchange, CEP)KalidoMicrosoftNumentaOracleSAPSASSplunkSyncsortTalendTibcoData ProvidersComScoreDatasiftExperianFactualGfKGnipIMSInrixKaggleKnoemaLexisNexisMicrosoft (with their Windows Azure Marketplace data market)NielsenReutersSalesforce Radian6Symphony IRIsocial network websites like Facebook, Google+, LinkedIn, Tumblr, Twitter or Viadeoall the Open Data providers, like governments, regions, etc.Marshalling domain - Very Large Data Warehousing and BI AppliancesActian; ParaccelEMC² (Greenplum)HP (Vertica)IBM (Netezza)KognitioMicrosoft (SQL 2012 and PDW)Oracle (Exadata)SAP (HANA and Sybase IQ)SASTeradataNoSQL Domain – Main technologies and vendors: Amazon (as cloud provider or with their own NoSQL solution)CassandraCloudera (CDH, Hadoop distribution)CouchDBEMC²GoogleHadoop (of course)GoogleHortonworks (Hadoop distribution)HPIBMKXMapR (Hadoop distribution)MarkLogicMicrosoft (Hadoop on Windows and Azure)MongoDBNeo4JOraclePalantirSnaplogicSparsitySplunkTeradata (Aster Data)ZL TechnologiesContent Management Space:AdobeAlfrescoEMC² (Documentum)IBM (FileNet)HP (Autonomy)MicrosoftOpenTextOracle.Analytics phasePredictive technologies (such as data mining) and vendors which are Adobe, EMC², GoodData, Hadoop Map Reduce, HP, IBM (SPSS), Karmasphere, Kxen, Microsoft, Mzinga, Oracle, R, Salesforce, SAS, SAP (R on HANA) and Teradata (Aprimo). Data Virtualization (and data federation) is currently led by Composite, Denodo, HP (IDOL), IBM, Informatica, Microsoft, Oracle (Exalytics), SAP and Teiid (JBoss community).c BI Tools Vendors:ActuateDassaultSystèmes (Exalead)DomoEsriGoodDataGoogleHP (Autonomy)IBM (Cognos suite)Information BuildersLogiXML (LogiAnalytics)Microsoft (SQL 2012)MicrostrategyNeutrinoBIOracle (OBI Foundation)PanopticonPanoramaPentahoQlikviewRoambiSAP (BI4 suite)SASSpagoBITableauTIBCO Spotfire.Action Phase - Data Acquisition providers plus the ERP, CRM and BPM actorsAdobeEloquaEMC²IBMiGrafxMicrosoftOpenTextOraclePegaProgress softwareSAPSalesforceSoftware AGTeradata (Aprimo) Tibco.Data Governance area - Master Data Management (MDM), metadata and data quality toolsAdaptiveHPIBMInformaticaKalidoMicrosoftOracleOrchestra NetworksSAPSASTalendTibco. Note that the Complex Event Processing (CEP) Tools are part of Acquisition (streaming data acquisition), Marshalling (eg in-memory storage as data is used or compared immediately) and Analytics (eg Monitoring functions to detect abnormal activity) streams.Note that the BI Tools are part of Analytics (Computing Key Performance Indicators) and Action (eg Creating Alerts in a push mode by mail for instance) streams.
Citrisleaf = AerospikeCouchbase – roots are in Northscale – Membase .. CouchDB; two focus audiences – Enterprise & funnel
Analytics Infrastrucure = MPP – Distributed open-source, Apache-licensed distribution of Apache Hadoop ... Open source, Massively Parallel Processing (MPP) query engineInfrastucure ad a Service = Cloud IaaSOperational Infrastructure = Structure of Data – ex JSAN; ad-hoc queries; unstructured data; behaviorial, redundencyNot Listed – Hardware / Storage – NetApp, EMC, HP
Per Forbes (per Wikibon), Big Data is an $18 billion industry heading to $50 billion in five years. The companies in the inner-circle (ex: MapR, Cloudera, Splunk, Couchbase etc) are pure-plays within Big Data. A theory is these inner-circle players will probably get gobbled up by the big boys on the outside, who are just starting to play in the Big Data space (like SAP, Microsoft, Oracle, IBM…) In the meantime, the relative sizes of the circles reflects the relative size of the companies, in terms of revenue. The percentages reflect the % of their current business that is ‘big data’
5/18/13 w/ Paul HofmannPalantir – just text; just Homeland SecurityOracle Endica – addedHP Autonomy AddedAttivio (partner with TIBCO added)Saffron – Semantec and .. (Risk predictive) added0xData – changed logoMuSigma -= Consultant onlyRecorded Future -= Timeline; Opera = Text-only?; No predictive Analytics?Kxen – nice companySAS – Dead? Not scalable; Skytree = a platform / toolbox.. You need to have yoru own Data Quant to create yuur own analytics Sociocast – Saffron PartnerDigital Reasoning – Strong with Dept of Defense too
NoSQL databases currently available include:Hbase (Apache)Cassandra (DataStax)MarkLogic (MarkLogic)Aerospike (CitrixDB)MongoDB (10gen)Accumulo (Apache)Riak (Basho)CouchDB (CouchBase)DynamoDB (Amazon)Sqrrl (?)VoltDB (?)http://thinkbiganalytics.com/leading_big_data_technologies/nosql/NoSQLNoSQL is an umbrella term for a broad class of database management systems that relax some of the tradition design constraints of relational database management systems (RDBMS) in order to meet goals of more cost-effective scalability, flexible tradeoffs of availability vs. consistency (as described by the CAP theorem), and flexibility for data structures that don’t fit well into the relational model, such as key-value data and large graphs. NoSQL databases typically don’t offer ACID transactions nor full SQL dialects.The NoSQL ecosystem is very large. Among the better known databases are HBase, Cassandra, Aerospike, DynamoDB, MongoDB, Riak, Redis, Accumulo, Datatomic, and Couchbase. Of these, HBase and Accumulo are more closely tied to Hadoop than the others, as both use HDFS, by default, for persistent storage and Zookeeper for service federation.NoSQL databases expose different information models, including key-value records, JSON or XML documents as records, or graph-oriented data. They expose corresponding programmer APIs and sometimes custom query languages that may or may not be SQL-based. However, a recent trend in this industry is the re-introduction of restricted SQL dialects to support the large user community accustomed to SQL and improving support for transactions.As an example of a scenario where a NoSQL database is a good fit, an event log for a web site might be captured in a key-value store, where fast appends and key-based retrievals are required, but not updates nor joins.HBaseHBase is a distributed, column-oriented database, where each cell is versioned (a configurable number of previous values is retained). HBase provides Bigtable-like capabilities on top of Hadoop. SQL queries (but not updates) are supported using Hive, but with high latency. Eventually, Impala will also support Hive queries with lower latency. Like many NoSQL databases, HBase does not support complex transactions, SQL, or ACID transactions. However, HBase offers high read and write performance and is used in several large applications, such as Facebook’s Messaging Platform. By default, HBase uses HDFS for durable storage, but it layers on top of this storage fast record-level queries and updates, which “raw” HDFS doesn’t support. Hence, HBase is useful when fast, record-level queries and updates are required, but storage in HDFS is desired for use with Pig, Hive, or other MapReduce-based tools.Cassandra Cassandra is the most popular NoSQL database for very large data sets. It is a key-value, clustered database that uses column-oriented storage, sharding by key ranges, and redundant storage for scalability in both data sizes and read/write performance, as well as resiliency against “hot” nodes and node failures. Cassandra has configurable consistency vs. availability (CAP theorem) tradeoffs, such as a tunable quorum model for writes.MongoDB MongoDB is a document-oriented NoSQL database where each record is a JSON document. It has a rich, Javascript-based query language that exploits the implicit structure of JSON. MongoDB supports sharding for improved scalability and resilience. It is most popular for small to large data sets and less commonly used for very large data sets.DynamoDBDynamoDB is Amazon’s highly scalable and available, key-value, NoSQL database. DynamoDB was one of the earliest NoSQL databases and papers written about it influenced the design of many other NoSQL databases, such as Cassandra.CouchbaseCouchbase is a key-value NoSQL database that is well-suited for mobile applications where a copy of a data set is resident on many devices, where changes can be performed on any copy, and copies are synchronized when connectivity is available. Think of how an email client works with local copies of your email history and corresponding email servers. RedisRedis is a key-value store with the specific support for fundamental data structures as values, including strings, hash maps, lists, sets, and sorted sets, whereas most key-value stores have limited understanding of a value’s meaning, except to represent the value as column cells, if many cases. For this reason, Redis is sometimes called a data structure server. Redis keeps all data in memory, which improves performance, but limits the data set sizes it can manage. Durability is optional, by periodic flushing to disk or writing updates to an append log. Master slave replication is also supported. Datomic Datomic is a newer entrant in the NoSQL landscape with a unique data model that remembers the state of the database at all points in the past, making historical reconstruction of events and state trivial. Many standard database operations are supported, including joins and ACID transactions. Deployments are distributed, elastic, highly available. RiakRiak is a fault-tolerant, distributed, key-value NoSQL database designed for large-scale deployments in cloud or hosted environments. A Riak database is masterless, with no single points of failure. It is resilient against the failure of multiple nodes and nodes can be added or removed easily. Riak is also optimized for read and write-intensive applications.