Big Data in the Cloud with Azure Marketplace ImagesMark Kromer
Here are some of the trends that I'm seeing from customer looking to build Azure-based Cloud Big Data solutions using images from the Azure Marketplace
This was a very interesting conference, TIC students oriented where I take him to the azure ecosystem for data warehousing architecture and best practices to reach powerful Business Intelligence Solutions according to the new era
Big Data in the Cloud with Azure Marketplace ImagesMark Kromer
Here are some of the trends that I'm seeing from customer looking to build Azure-based Cloud Big Data solutions using images from the Azure Marketplace
This was a very interesting conference, TIC students oriented where I take him to the azure ecosystem for data warehousing architecture and best practices to reach powerful Business Intelligence Solutions according to the new era
Operationalizing Machine Learning at Scale at StarbucksDatabricks
As ML-driven innovations are propelled by the Self-Service capabilities in the Enterprise Data and Analytics Platform, teams face a significant entry barrier and productivity issues in moving from POCs to Operating ML-powered apps at scale in production.
I have presented on AWS Big Data Analytics technologies and discussed on how AWS provides a big data platform that allows you to collect, store, and analyze data, how to use AWS services for Data Streaming and Big Data along with some demos on how to build big data solutions using Amazon EMR and Amazon Redshift in a step-by-step manner.
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks
How did Devon move from a traditional reporting and data warehouse approach to a modern data lake? What did it take to go from a slow and brittle technical landscape to an a flexible, scalable, and agile platform? In the past, Devon addressed data solutions in dozens of ways depending on the user and the requirements. Through a visionary program, driven by Databricks, Devon has begun a transformation of how it consumes data and enables engineers, analysts, and IT developers to deliver data driven solutions along all levels of the data analytics spectrum. We will share the vision, technical architecture, influential decisions, and lessons learned from our journey. Join us to hear the unique Databricks success story at Devon.
Pentaho Big Data Analytics with Vertica and HadoopMark Kromer
Overview of the Pentaho Big Data Analytics Suite from the Pentaho + Vertica presentation at Big Data Techcon 2014 in Boston for the session called "The Ultimate Selfie | Picture Yourself with the Fastest Analytics on Hadoop with HP Vertica and Pentaho"
Machine Learning and The Big Data RevolutionRob Thomas
Data is transforming every industry, whether you are a retailer, financial services firm, a physician, or a farmer. The winners in the data era will be decided by those that can move the fastest along the big data maturity curve. There are 3 business models for the Data era, and every organization must make a conscious decision on which one they choose.
Companies achieve big data leadership by rapidly transforming their skills and learning how to automate the application of analytics through machine learning. The Big Data Revolution will highlight the winners, describe why they are winning, and offer a practical approach for accelerating your organization to Big Data leadership.
Operationalizing Machine Learning at Scale at StarbucksDatabricks
As ML-driven innovations are propelled by the Self-Service capabilities in the Enterprise Data and Analytics Platform, teams face a significant entry barrier and productivity issues in moving from POCs to Operating ML-powered apps at scale in production.
I have presented on AWS Big Data Analytics technologies and discussed on how AWS provides a big data platform that allows you to collect, store, and analyze data, how to use AWS services for Data Streaming and Big Data along with some demos on how to build big data solutions using Amazon EMR and Amazon Redshift in a step-by-step manner.
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks
How did Devon move from a traditional reporting and data warehouse approach to a modern data lake? What did it take to go from a slow and brittle technical landscape to an a flexible, scalable, and agile platform? In the past, Devon addressed data solutions in dozens of ways depending on the user and the requirements. Through a visionary program, driven by Databricks, Devon has begun a transformation of how it consumes data and enables engineers, analysts, and IT developers to deliver data driven solutions along all levels of the data analytics spectrum. We will share the vision, technical architecture, influential decisions, and lessons learned from our journey. Join us to hear the unique Databricks success story at Devon.
Pentaho Big Data Analytics with Vertica and HadoopMark Kromer
Overview of the Pentaho Big Data Analytics Suite from the Pentaho + Vertica presentation at Big Data Techcon 2014 in Boston for the session called "The Ultimate Selfie | Picture Yourself with the Fastest Analytics on Hadoop with HP Vertica and Pentaho"
Machine Learning and The Big Data RevolutionRob Thomas
Data is transforming every industry, whether you are a retailer, financial services firm, a physician, or a farmer. The winners in the data era will be decided by those that can move the fastest along the big data maturity curve. There are 3 business models for the Data era, and every organization must make a conscious decision on which one they choose.
Companies achieve big data leadership by rapidly transforming their skills and learning how to automate the application of analytics through machine learning. The Big Data Revolution will highlight the winners, describe why they are winning, and offer a practical approach for accelerating your organization to Big Data leadership.
z Systems redefining Enterprise IT for digital business - Alain PoquillonNRB
IBM z Systems with the new z13 is the backbone infrastructure for the evolving digital era. Built on over 50 years of experience and billions of dollars in developing leading-edge technology, it is at the forefront of modern Information Technology. On different domains. Mr. Poquillon illustrates IBMs’ z13 pre-eminence by highlighting its assets such as its shared-everything approach and centralized management of resources that make it naturally fit for cloud; its hybrid transaction/analytics processing capabilities that provide real-time analytics more efficiently to in-process transactional data, and finally its ability to provide the scale and performance a business needs to survive the mobile and social onslaught.
Machine Learning is to the 21st Century, what the Industrial Revolution was to the 18th Century. We are entering the era of Continuous Intelligence. http://www.forbes.com/sites/ibm/2017/02/15/machine-learning-ushers-in-a-world-of-continuous-intelligence/#246de3604c62
Presented at IZEAfest in Orlando, FL
Social network and sharing analysis including:
+Document analysis at scale: Meme tracking combined with other variables like sentiment and bias
+Social network at scale: Information cascades and virality, inference of social networks given meme-like information as contagions
+The node level perspective and its effects on what an individual sees and shares: Illusions, effort and overload, topics, personality and demographics
+Personas and segmentation: Grouping based on demographics and interests
Building a performing Machine Learning model from A to ZCharles Vestur
A 1-hour read to become highly knowledgeable about Machine learning and the machinery underneath, from scratch!
A presentation introducing to all fundamental concepts of Machine Learning step by step, following a classical approach to build a performing model. Simple examples and illustrations are used all along the presentation to make the concepts easier to grasp.
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...Impetus Technologies
In spite of investments in big data lakes, there is wide use of expensive proprietary products for data ingestion, integration, and transformation (ETL) while bringing and processing data on the lake.
Enterprises have successfully tested Apache Spark for its versatility and strengths as a distributed computing framework that can completely handle all needs for data processing, analytics, and machine learning workloads.
Since the Hadoop distributions and the public cloud already include Apache Spark, there is nothing new to be procured. However, the skills required to put Spark to good use are typically unavailable today.
In this webinar, we will discuss how Apache Spark can be an inexpensive enterprise backbone for all types of data processing workloads. We will also demo how a visual framework on top of Apache Spark makes it much more viable.
The following scenarios will be covered:
On-Prem
Data quality and ETL with Apache Spark using pre-built operators
Advanced monitoring of Spark pipelines
On Cloud
Visual interactive development of Apache Spark Structured Streaming pipelines
IoT use-case with event-time, late-arrival and watermarks
Python based predictive analytics running on Spark
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data PlatformRackspace
There's an elephant in the room when it comes to Big Data. Apache Hadoop and Spark offer the promise to transform how businesses leverage Big Data, finding the right mix of flexible deployments, elastic scalability, and performance can be daunting.
Introducing Rackspace OnMetal™ for Apache Spark™ an industry first that combines the performance and efficiency of bare metal with the ease and flexibility of cloud. With Rackspace OnMetal for Cloud Big Data Platform you can transform how you run Hadoop and Spark workloads:
•Deploy in minutes, not months
•Spin instances up or down on demand
•Process data in-memory for faster query times
•Get bare metal performance and say goodbye to virtualization taxes
Sign up and learn how Rackspace OnMetal for Cloud Big Data Platform can rapidly move your organization from planning to deploying.
DoneDeal AWS Data Analytics Platform build using AWS products: EMR, Data Pipeline, S3, Kinesis, Redshift and Tableau. Custom built ETL was written using PySpark.
Cloud-native Semantic Layer on Data LakeDatabricks
With larger volume and more real-time data stored in data lake, it becomes more complex to manage these data and serve analytics and applications. With different service interfaces, data caliber, performance bias on different scenarios, the business users begin to suffer low confidence on quality and efficiency to get insight from data.
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation NRB
Frank Van der Wal - Technical Lead IBM Z BENELUX Digital Transformation Specialist
Leif Pedersen - IBM Analytics for IBM Z Specialist at IBM
Mainframe Innovation Tour (API enconomy, Hybrid Cloud, Enterprise Linux, Machine learning, Spark)
Overview of Apache Trafodion (incubating), Enterprise Class Transactional SQL-on-Hadoop DBMS, with operational use cases, what it takes to be a world class RDBMS, some performance information, and the new company Esgyn which will leverage Apache Trafodion for operational solutions.
Data and Analytics at Holland & Barrett: Building a '3-Michelin-star' Data Pl...Dobo Radichkov
This presentation, delivered at the AWS London Summit 2023, provides an in-depth look at how Holland & Barrett built a robust, high-performing data platform on AWS to drive insights at the speed of thought. Dobo Radichkov, Chief Data Officer, shares key aspects of the data strategy, outlining how the company utilised AWS Redshift, Metabase, and Retool to create an efficient data lake, data warehouse, and analytics layer. The presentation also discusses the transformative impact of this data infrastructure on various business areas, including Finance, Commercial, Supply Chain, Customer, Digital, and Wellness. Through this data-driven journey, Holland & Barrett aims to become the beating heart of the organization, unlocking success for colleagues, customers, and partners alike.
In the presentation, Dobo Radichkov lays out Holland & Barrett's vision to make their Data & Analytics team the heartbeat of the organization, a vision that has guided their strategy and tool selection. He explains how this vision is brought to life through their organizational structure, comprising of six specialized teams: Data Engineering, Data Warehouse, Business Intelligence, Data Science, Web & App Analytics, and Digital Analytics.
Dobo takes the audience through the company's strategic roadmap, a three-phase plan guiding the growth and development of their data capabilities. This roadmap isn’t just a technological plan but signifies a transformational journey for the team, aiming to embed data-driven decision-making in the DNA of Holland & Barrett.
Lastly, he showcases the '3-Michelin-star' data platform's architecture, painting a clear picture of how data moves from raw systems to the operational master data and, finally, to the analytics layer. The presentation concludes by highlighting how the newly formed data platform drives core business value and innovation across various business domains, reinforcing Holland & Barrett's commitment to becoming a data-led organization.
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...Amazon Web Services
Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. In this session we'll give an introduction to the service and its pricing before diving into how it delivers fast query performance on data sets ranging from hundreds of gigabytes to a petabyte or more.
From Data to Services at the Speed of BusinessAli Hodroj
From Data to Services at the Speed of Business: Applying cloud-native paradigm to combine fast data analytics with microservices architecture for hybrid workloads.
Data-driven analytics is making a measurable impact on businesses performance, helping companies pinpoint new sources of revenue and streamline operations. But traditional computing systems are challenged to keep up with a rapidly evolving data management landscape.
How do you foster superior efficiency, flexibility, and economy while meeting diverse and pressing analytics needs?
SAP® Sybase IQ and Dobler Consulting can help:
Traditional database systems were meant for processing transactions, but SAP® Sybase® IQ server is a highly efficient RDBMS optimized for extreme-scale EDWs and Big Data analytics – offering you faster data loading and query performance while slashing maintenance, hardware, and storage costs. Realize exponential improvement, even as thousands of employees and massive amounts of data (structured and unstructured) enter your ecosystem.
With SAP Sybase IQ 16 you can:
• Exploit the value of Big Data and incorporate into everyday business decision-making
• Transform your business through deeper insight by enabling analytics on real-time information
• Extend the power of analytics across your enterprise with speed, availability and security.
Please join us to learn the value offered by SAP Sybase IQ 16. And, see how by tying together your organization’s data assets – from operational data to external feeds and Big Data – SAP dramatically simplifies data management landscapes for both current and next-generation business applications, delivering information at unprecedented speeds and empowering a Big Data-enabled Enterprise Data Warehouse.
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
The world’s largest enterprises run their infrastructure on Oracle, DB2 and SQL and their critical business operations on SAP applications. Organisations need this data to be available in real-time to conduct necessary analytics. However, delivering this heterogeneous data at the speed it’s required can be a huge challenge because of the complex underlying data models and structures and legacy manual processes which are prone to errors and delays.
Unlock these silos of data and enable the new advanced analytics platforms by attending this session.
Find out how to:
• To overcome common challenges faced by enterprises trying to access their SAP data
• You can integrate SAP data in real-time with change data capture (CDC) technology
• Organisations are using Attunity Replicate for SAP to stream SAP data in to Kafka
Speakers:
John Hol, Regional Director, Attunity
Mike Hollobon, Director Business Development, IBT
At Data-centric Architecture Forum 2020 Thomas Cook, our Sales Director of AnzoGraph DB, gave his presentation "Knowledge Graph for Machine Learning and Data Science". These are his slides.
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Lucas Jellema
Data Science, Business Intelligence, Data Lake, Machine Learning and AI. Diverse terminology with a common goal: leverage data to realize business value. Through consolidated insight and automated processing, predictions, recommendations and actions. Using visualizations, dashboards, reports, alerts, machine learning models. Based on data. Data retrieved from raw sources into a data lake, wrangled into cleansed, enriched, anonymized and aggregated data sets and turned into business intelligence or used for training machine learning models, that in turn power Smart Applications. This session walks the audience through the start to end data flow on Oracle Autonomous Data Warehouse, Analytics Cloud, Big Data Cloud & Data Integration Platform.
As presented at the CloudBrew 2019 conference in Dec 14, 2019.
Love cognitive services but not sure how to use them at scale? Enjoy working with Apache Spark but always searching for a way to integrate AI and better machine learning algorithms? Now you can do it all. Run Azure Cognitive Services within Azure Databricks. Curious how? Come to this talk and learn how, what does it mean, performance tuning and best practices.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
3. Spark processes and analyzes data from ANY data source
Business Applications and Business Intelligence
Apache Spark
Spark
SQL
Spark
Streaming
MLlib
(machine
learning)
GraphX
Hadoop Database Mainframe
Data-
warehouse
visit www.spark.tc for more informationIBM | Spark
4. Spark is complementary to Hadoop, but much faster, with in-memory performance
RunningTime
0
30
60
90
120
Logistic Regression in Hadoop & Spark
Hadoop
110
Spark
0.9
Hadoop
Spark
visit www.spark.tc for more informationIBM | Spark
5. Clients are evolving their approach to data
Big Data Maturity
Value Operations Data
Warehousing
Line of Business
and Analytics
New Business
Imperatives
We are here
Lower the Cost
of Storage
Warehouse
Modernization
• Data lake
• Data offload
• ETL offload
• Queryable archive
and staging
Data-informed
Decision Making
• Full dataset
analysis (no more
sampling)
• Extract value from
non-relational data
• 360 view of all
enterprise data
• Exploratory analysis
and discovery
Business
Transformation
• Create new
business models
• Risk-aware
decision making
• Fight fraud and
counter threats
• Optimize
operations
• Attract, grow,
retain customers
visit www.spark.tc for more informationIBM | Spark
6. Why does Spark matter to a business?
visit www.spark.tc for more informationIBM | Spark
1 Spark makes it easier to access
and work with all data
2 Spark lets you develop line-of-business
applications faster
3 Spark learns from data
and delivers in real-time
- Enables new data-based use
cases
- All data: Internal/External,
Structured/Unstructured
- Real-time insights, from all
data sources
- Automates analytics with
machine learning
- Clients that lead in data, lead
in their industry
7. IBM has the largest investment in Spark of any company in the world
visit www.spark.tc for more informationIBM | Spark
IBM Spark Technology Center
Top committer/contributor
300+ inventors
Commitment to educate 1 million data scientists
Contributed SystemML
Founding member of AMPLab
Partnerships in the ecosystem
8. For Apache Spark news and innovation
from Spark Technology Center —
Sign up for the newsletter
at www.spark.tc