Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This tutorial covers design assumptions, design principles, and how to approach the architecture and planning for multi-use data infrastructure in IT.
Long:
The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This session will discuss hidden design assumptions, review design principles to apply when building multi-use data infrastructure, and provide a reference architecture to use as you work to unify your analytics infrastructure.
The focus in our market has been on acquiring technology, and that ignores the more important part: the larger IT landscape within which this technology lives and the data architecture that lies at its core. If one expects longevity from a platform then it should be a designed rather than accidental architecture.
Architecture is more than just software. It starts from use and includes the data, technology, methods of building and maintaining, and organization of people. What are the design principles that lead to good design and a functional data architecture? What are the assumptions that limit older approaches? How can one integrate with, migrate from or modernize an existing data environment? How will this affect an organization's data management practices? This tutorial will help you answer these questions.
Topics covered:
* A brief history of data infrastructure and past design assumptions
* Categories of data and data use in organizations
* Data architecture
* Functional architecture
* Technology planning assumptions and guidance
According to our most recent statistics, Vietnam is starting to have a
switch from outsourcing to product development. As a result of this
trend, developers want to advance their career by being more innovative.
Skilled developers spend very little of their time searching for jobs on
websites but instead of nding it through online communities, events
and social networks.
I've realized there's changing in IT recruitment in many regions, a lot of IT
companies have also changed their games as there is an increasing in
developer recruiting. Recently, more and more companies have been
moving their development department to Vietnam.
In this quarter, we have been partnering with hundreds of top IT
recruiters and conducting a deep research with thousands of developers
in our database in order to understand their needs. Let's explore!
The 3 Key Barriers Keeping Companies from Deploying Data Products Dataiku
Getting from raw data to deploying data-driven solutions requires technology, data, and people. All of which exist. So why aren’t we seeing more truly data-driven companies: what's missing and why? During Strata Hadoop World Singapore 2015, Pauline Brown, Director of Marketing at Dataiku, explains how lack of collaboration is what is keeping companies from building and deploying data products effectively. Learn more about Dataiku and Data Science Studio: www.dataiku.com
Herding Cats: Migrating Dozens of Oddball Analytics Systems to Apache Spark w...Databricks
HP ships millions of PCs, printers and other devices every year to customers in all market segments. Many of these systems have had various generations of data collection and reporting, going
back as many as 16 years. That has led to a significant sprawl of custom data formats, specialized code and numerous brittle legacy systems collecting, analyzing and reporting data.
This session will focus on samples of HP’s journey to find, catalog and ultimately eliminate these systems by migrating to Apache Spark with Databricks in the cloud. Hear about HP’s challenges dealing with legacy systems (some even located under engineers desks) and how the power of AWS, Spark, and visualization tools has significantly simplified their migrations. You’ll also learn how the success of this endeavor is not just in counting the number of systems deprecated, but also how the process is evolving into building companywide shared Spark libraries, notebooks and web services that are accelerating future migrations and analysis using Spark.
Building Resiliency and Agility with Data Virtualization for the New NormalDenodo
Watch: https://bit.ly/327z8UM
While the impact of COVID-19 is uniform across organisations in the region, a lot of how the organisation can recover from the impact and strive in the market would depend on their resiliency and business agility. An organisation’s data management strategy holds the key, as they tackle the challenges of siloed data sources, optimising for operational stability, and ensuring real time delivery of consistent and reliable information, irrespective of the data source or format.
Join this session to hear why large organisations are implementing Data Virtualization, a modern data integration approach in their data architecture to build resiliency, enhance business agility, and save costs.
In this session, you will learn:
- How to deliver clear strategy for agile data delivery across the enterprise without pains of traditional data integration
- How to provide a robust yet simple architecture for data governance, master data, data trust, data privacy and data access security implementation - all from single unified framework
- How to deploy digital transformation initiatives for Agile BI, Big Data, Enterprise Data Services & Data Governance
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) Dataiku
As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve this problem.
As Data Manager, you know the challenges ahead:
- Multitudes of technology choices to make
- Building a team and solving the skill-set disconnect
- Data can be deceiving...
- Figuring out what the successful data product must be
Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising, and gaming industries, holding various data or CTO roles. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains encountered by data teams all around.
How to Build a Successful Data Team - Florian Douetteau @ PAPIs ConnectPAPIs.io
As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve that.
As Data Manager, you know the challenges ahead:
Multitudes of technology choices to make
Building a team and solving the skill-set disconnect
Data can be deceiving...
Figuring out what the successful data product must be
The goal of this talk is to provide some perspective to these topics
Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising and gaming industries, holding various data or CTO’s role. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains from the data enthusiasts and let them express their creativity.
Analytics-Enabled Experiences: The New Secret WeaponDatabricks
Tracking and analyzing how our individual products come together has always been an elusive problem for Steelcase. Our problem can be thought of in the following way: “we know how many Lego pieces we sell, yet we don’t know what Lego set our customers buy.” The Data Science team took over this initiative, which resulted in an evolution of our analytics journey. It is a story of innovation, resilience, agility and grit.
The effects of the COVID-19 pandemic on corporate America shined the spotlight on office furniture manufacturers to solve for ways on which the office can be made safe again. The team would have never imagined how relevant our work on product application analytics would become. Product application analytics became an industry priority overnight.
The proposal presented this year is the story of how data science is helping corporations bring people back to the office and set the path to lead the reinvention of the office space.
After groundbreaking milestones to overcome technical challenges, the most important question is: What do we do with this? How do we scale this? How do we turn this opportunity into a true competitive advantage? The response: stop thinking about this work as a data science project and start to think about this as an analytics-enabled experience.
During our session we will cover the technical elements that we overcame as a team to set-up a pipeline that ingests semi-structured and unstructured data at scale, performs analytics and produces digital experiences for multiple users.
This presentation will be particularly insightful for Data Scientists, Data Engineers and analytics leaders who are seeking to better understand how to augment the value of data for their organization
Intergen's newsletter, Smarts, now available for online reading.
Intergen provides information technology solutions across Australia, New Zealand and the world based exclusively on Microsoft’s tools and technologies.
[DevDay2019] Why you'll lose without UX Design - By Szilard Toth, CTO at e·pi...DevDay.org
UX Design is on a radical rise. The most successful companies like Google or Uber know that great UX is no longer a nice-to-have but a key business driver. Szilard Toth (CTO e·pilot) and Nicolas Python (Head of Design KLARA) talk about their own experience of UX Design in modern engineering environments. Whether you're a business leader or an engineer, learn why you'll lose without UX Design.
Predictive Modeling & Data-Driven Product Insights at LinkedIn - Scott Nichol...Scott Nicholson
Talk given at Advanced Analytics & Big Data Forum conference in San Francisco on April 25, 2012.
Abstract: Data on 150+ million professionals' careers and networks provide a fascinating playground for analysts to discover data insights about career trends, the social web and the economy. This talk will focus on how insights extracted from the LinkedIn dataset enable individuals with limited information the ability to make better decisions about their professional lives. In the course of this theme we will discuss data tools, insights and approaches to predictive modeling in the context of the LinkedIn dataset and Analytics Team.
Business and IT alignment, how to escape from the sand-trappierino23
The need for the IT to be fast enough to supply Business demand, is a false problem and can even probably be dangerous, leading to disastrous solutions often creating more and more inertia to IT itself that will be less and less capable of keeping pace with the Business. In addition an IT innovation will boost the business even more leaving IT in the dust. The approach to be fulfill is to make the Business "ride" the IT and create a virtuous circle.
The speech will describe the risks in insisting in Business-IT alignment and investigate how to leverage MDA as a way to successfully have the Business to "ride" IT.
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Simplilearn
In this presentation, we will decode the basic differences between data scientist, data analyst and data engineer, based on the roles and responsibilities, skill sets required, salary and the companies hiring them. Although all these three professions belong to the Data Science industry and deal with data, there are some differences that separate them. Every person who is aspiring to be a data professional needs to understand these three career options to select the right one for themselves. Now, let us get started and demystify the difference between these three professions.
We will distinguish these three professions using the parameters mentioned below:
1. Job description
2. Skillset
3. Salary
4. Roles and responsibilities
5. Companies hiring
This Master’s Program provides training in the skills required to become a certified data scientist. You’ll learn the most in-demand technologies such as Data Science on R, SAS, Python, Big Data on Hadoop and implement concepts such as data exploration, regression models, hypothesis testing, Hadoop, and Spark.
Why be a Data Scientist?
Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data scientist you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
Simplilearn's Data Scientist Master’s Program will help you master skills and tools like Statistics, Hypothesis testing, Clustering, Decision trees, Linear and Logistic regression, R Studio, Data Visualization, Regression models, Hadoop, Spark, PROC SQL, SAS Macros, Statistical procedures, tools and analytics, and many more. The courseware also covers a capstone project which encompasses all the key aspects from data extraction, cleaning, visualisation to model building and tuning. These skills will help you prepare for the role of a Data Scientist.
Who should take this course?
The data science role requires the perfect amalgam of experience, data science knowledge, and using the correct tools and technologies. It is a good career choice for both new and experienced professionals. Aspiring professionals of any educational background with an analytical frame of mind are most suited to pursue the Data Scientist Master’s Program, including:
IT professionals
Analytics Managers
Business Analysts
Banking and Finance professionals
Marketing Managers
Supply Chain Network Managers
Those new to the data analytics domain
Students in UG/ PG Analytics Programs
Learn more at https://www.simplilearn.com/big-data-and-analytics/senior-data-scientist-masters-program-training
The world of testers has been changing a lot in the last 10 years. And the change continue in an ever increasing speed! In this pre-conference keynote Derk-Jan de Grood and Jan Jaap Cannegieter will highlight changes and trends that will influence the way we do our work. This will introduce challenges for testers, today and in the next few years. The challenges are related to for instance test automation, Continues Integration and Deployment, technical and functional knowledge, how to deal with (senior) management, working in multi-disciplinary teams and organisations that change their business model. During this pre-conference keynote you will be informed and challenged. Which knowledge do you need to develop yourself and prepare for the future. Derk-Jan and Jan Jaap will discuss how you can prepare for these challenges and will provide a guide into the ATD2019 program. Get the most out of this conference and attend the sessions that are most helpful to prepare for these challenges.
In this deck, I am going to cover the difference between Kaggle & Real-world projects.
Now grab my content on your favorite platform:
YouTube: https://www.youtube.com/watch?v=IhGpuLFv4Ho
SoundCloud: https://soundcloud.com/ankitrathi/kaggle-vs-real-world-projects
GitHub: https://ankit-rathi.github.io/data-and-ai/markdown/2020/06/25/Kaggle-Vs-Real-world-Projects.html
LinkedIn: https://www.linkedin.com/pulse/kaggle-vs-real-world-projects-ankit-rathi
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This tutorial covers design assumptions, design principles, and how to approach the architecture and planning for multi-use data infrastructure in IT.
Long:
The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This session will discuss hidden design assumptions, review design principles to apply when building multi-use data infrastructure, and provide a reference architecture to use as you work to unify your analytics infrastructure.
The focus in our market has been on acquiring technology, and that ignores the more important part: the larger IT landscape within which this technology lives and the data architecture that lies at its core. If one expects longevity from a platform then it should be a designed rather than accidental architecture.
Architecture is more than just software. It starts from use and includes the data, technology, methods of building and maintaining, and organization of people. What are the design principles that lead to good design and a functional data architecture? What are the assumptions that limit older approaches? How can one integrate with, migrate from or modernize an existing data environment? How will this affect an organization's data management practices? This tutorial will help you answer these questions.
Topics covered:
* A brief history of data infrastructure and past design assumptions
* Categories of data and data use in organizations
* Data architecture
* Functional architecture
* Technology planning assumptions and guidance
According to our most recent statistics, Vietnam is starting to have a
switch from outsourcing to product development. As a result of this
trend, developers want to advance their career by being more innovative.
Skilled developers spend very little of their time searching for jobs on
websites but instead of nding it through online communities, events
and social networks.
I've realized there's changing in IT recruitment in many regions, a lot of IT
companies have also changed their games as there is an increasing in
developer recruiting. Recently, more and more companies have been
moving their development department to Vietnam.
In this quarter, we have been partnering with hundreds of top IT
recruiters and conducting a deep research with thousands of developers
in our database in order to understand their needs. Let's explore!
The 3 Key Barriers Keeping Companies from Deploying Data Products Dataiku
Getting from raw data to deploying data-driven solutions requires technology, data, and people. All of which exist. So why aren’t we seeing more truly data-driven companies: what's missing and why? During Strata Hadoop World Singapore 2015, Pauline Brown, Director of Marketing at Dataiku, explains how lack of collaboration is what is keeping companies from building and deploying data products effectively. Learn more about Dataiku and Data Science Studio: www.dataiku.com
Herding Cats: Migrating Dozens of Oddball Analytics Systems to Apache Spark w...Databricks
HP ships millions of PCs, printers and other devices every year to customers in all market segments. Many of these systems have had various generations of data collection and reporting, going
back as many as 16 years. That has led to a significant sprawl of custom data formats, specialized code and numerous brittle legacy systems collecting, analyzing and reporting data.
This session will focus on samples of HP’s journey to find, catalog and ultimately eliminate these systems by migrating to Apache Spark with Databricks in the cloud. Hear about HP’s challenges dealing with legacy systems (some even located under engineers desks) and how the power of AWS, Spark, and visualization tools has significantly simplified their migrations. You’ll also learn how the success of this endeavor is not just in counting the number of systems deprecated, but also how the process is evolving into building companywide shared Spark libraries, notebooks and web services that are accelerating future migrations and analysis using Spark.
Building Resiliency and Agility with Data Virtualization for the New NormalDenodo
Watch: https://bit.ly/327z8UM
While the impact of COVID-19 is uniform across organisations in the region, a lot of how the organisation can recover from the impact and strive in the market would depend on their resiliency and business agility. An organisation’s data management strategy holds the key, as they tackle the challenges of siloed data sources, optimising for operational stability, and ensuring real time delivery of consistent and reliable information, irrespective of the data source or format.
Join this session to hear why large organisations are implementing Data Virtualization, a modern data integration approach in their data architecture to build resiliency, enhance business agility, and save costs.
In this session, you will learn:
- How to deliver clear strategy for agile data delivery across the enterprise without pains of traditional data integration
- How to provide a robust yet simple architecture for data governance, master data, data trust, data privacy and data access security implementation - all from single unified framework
- How to deploy digital transformation initiatives for Agile BI, Big Data, Enterprise Data Services & Data Governance
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) Dataiku
As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve this problem.
As Data Manager, you know the challenges ahead:
- Multitudes of technology choices to make
- Building a team and solving the skill-set disconnect
- Data can be deceiving...
- Figuring out what the successful data product must be
Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising, and gaming industries, holding various data or CTO roles. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains encountered by data teams all around.
How to Build a Successful Data Team - Florian Douetteau @ PAPIs ConnectPAPIs.io
As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve that.
As Data Manager, you know the challenges ahead:
Multitudes of technology choices to make
Building a team and solving the skill-set disconnect
Data can be deceiving...
Figuring out what the successful data product must be
The goal of this talk is to provide some perspective to these topics
Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising and gaming industries, holding various data or CTO’s role. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains from the data enthusiasts and let them express their creativity.
Analytics-Enabled Experiences: The New Secret WeaponDatabricks
Tracking and analyzing how our individual products come together has always been an elusive problem for Steelcase. Our problem can be thought of in the following way: “we know how many Lego pieces we sell, yet we don’t know what Lego set our customers buy.” The Data Science team took over this initiative, which resulted in an evolution of our analytics journey. It is a story of innovation, resilience, agility and grit.
The effects of the COVID-19 pandemic on corporate America shined the spotlight on office furniture manufacturers to solve for ways on which the office can be made safe again. The team would have never imagined how relevant our work on product application analytics would become. Product application analytics became an industry priority overnight.
The proposal presented this year is the story of how data science is helping corporations bring people back to the office and set the path to lead the reinvention of the office space.
After groundbreaking milestones to overcome technical challenges, the most important question is: What do we do with this? How do we scale this? How do we turn this opportunity into a true competitive advantage? The response: stop thinking about this work as a data science project and start to think about this as an analytics-enabled experience.
During our session we will cover the technical elements that we overcame as a team to set-up a pipeline that ingests semi-structured and unstructured data at scale, performs analytics and produces digital experiences for multiple users.
This presentation will be particularly insightful for Data Scientists, Data Engineers and analytics leaders who are seeking to better understand how to augment the value of data for their organization
Intergen's newsletter, Smarts, now available for online reading.
Intergen provides information technology solutions across Australia, New Zealand and the world based exclusively on Microsoft’s tools and technologies.
[DevDay2019] Why you'll lose without UX Design - By Szilard Toth, CTO at e·pi...DevDay.org
UX Design is on a radical rise. The most successful companies like Google or Uber know that great UX is no longer a nice-to-have but a key business driver. Szilard Toth (CTO e·pilot) and Nicolas Python (Head of Design KLARA) talk about their own experience of UX Design in modern engineering environments. Whether you're a business leader or an engineer, learn why you'll lose without UX Design.
Predictive Modeling & Data-Driven Product Insights at LinkedIn - Scott Nichol...Scott Nicholson
Talk given at Advanced Analytics & Big Data Forum conference in San Francisco on April 25, 2012.
Abstract: Data on 150+ million professionals' careers and networks provide a fascinating playground for analysts to discover data insights about career trends, the social web and the economy. This talk will focus on how insights extracted from the LinkedIn dataset enable individuals with limited information the ability to make better decisions about their professional lives. In the course of this theme we will discuss data tools, insights and approaches to predictive modeling in the context of the LinkedIn dataset and Analytics Team.
Business and IT alignment, how to escape from the sand-trappierino23
The need for the IT to be fast enough to supply Business demand, is a false problem and can even probably be dangerous, leading to disastrous solutions often creating more and more inertia to IT itself that will be less and less capable of keeping pace with the Business. In addition an IT innovation will boost the business even more leaving IT in the dust. The approach to be fulfill is to make the Business "ride" the IT and create a virtuous circle.
The speech will describe the risks in insisting in Business-IT alignment and investigate how to leverage MDA as a way to successfully have the Business to "ride" IT.
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Simplilearn
In this presentation, we will decode the basic differences between data scientist, data analyst and data engineer, based on the roles and responsibilities, skill sets required, salary and the companies hiring them. Although all these three professions belong to the Data Science industry and deal with data, there are some differences that separate them. Every person who is aspiring to be a data professional needs to understand these three career options to select the right one for themselves. Now, let us get started and demystify the difference between these three professions.
We will distinguish these three professions using the parameters mentioned below:
1. Job description
2. Skillset
3. Salary
4. Roles and responsibilities
5. Companies hiring
This Master’s Program provides training in the skills required to become a certified data scientist. You’ll learn the most in-demand technologies such as Data Science on R, SAS, Python, Big Data on Hadoop and implement concepts such as data exploration, regression models, hypothesis testing, Hadoop, and Spark.
Why be a Data Scientist?
Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data scientist you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
Simplilearn's Data Scientist Master’s Program will help you master skills and tools like Statistics, Hypothesis testing, Clustering, Decision trees, Linear and Logistic regression, R Studio, Data Visualization, Regression models, Hadoop, Spark, PROC SQL, SAS Macros, Statistical procedures, tools and analytics, and many more. The courseware also covers a capstone project which encompasses all the key aspects from data extraction, cleaning, visualisation to model building and tuning. These skills will help you prepare for the role of a Data Scientist.
Who should take this course?
The data science role requires the perfect amalgam of experience, data science knowledge, and using the correct tools and technologies. It is a good career choice for both new and experienced professionals. Aspiring professionals of any educational background with an analytical frame of mind are most suited to pursue the Data Scientist Master’s Program, including:
IT professionals
Analytics Managers
Business Analysts
Banking and Finance professionals
Marketing Managers
Supply Chain Network Managers
Those new to the data analytics domain
Students in UG/ PG Analytics Programs
Learn more at https://www.simplilearn.com/big-data-and-analytics/senior-data-scientist-masters-program-training
The world of testers has been changing a lot in the last 10 years. And the change continue in an ever increasing speed! In this pre-conference keynote Derk-Jan de Grood and Jan Jaap Cannegieter will highlight changes and trends that will influence the way we do our work. This will introduce challenges for testers, today and in the next few years. The challenges are related to for instance test automation, Continues Integration and Deployment, technical and functional knowledge, how to deal with (senior) management, working in multi-disciplinary teams and organisations that change their business model. During this pre-conference keynote you will be informed and challenged. Which knowledge do you need to develop yourself and prepare for the future. Derk-Jan and Jan Jaap will discuss how you can prepare for these challenges and will provide a guide into the ATD2019 program. Get the most out of this conference and attend the sessions that are most helpful to prepare for these challenges.
Similar to Data Professionals: Job of the Century (20)
In this deck, I am going to cover the difference between Kaggle & Real-world projects.
Now grab my content on your favorite platform:
YouTube: https://www.youtube.com/watch?v=IhGpuLFv4Ho
SoundCloud: https://soundcloud.com/ankitrathi/kaggle-vs-real-world-projects
GitHub: https://ankit-rathi.github.io/data-and-ai/markdown/2020/06/25/Kaggle-Vs-Real-world-Projects.html
LinkedIn: https://www.linkedin.com/pulse/kaggle-vs-real-world-projects-ankit-rathi
This is the pilot episode of Data & AI Platform Concepts, where I intend to cover all the aspects of Data & AI Platform like Data Systems, Analytics/ML/AI, Cloud Services, DevOps, Governance.
https://www.ankitrathi.com
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)Ankit Rathi
While designing and building Data & AI platforms, you may need to evaluate the options available. Whether your platform would be on-premise or you could use cloud/s services or you would take a hybrid approach.
In any case, you may need to look and evaluate various tools & services for your ingestion, storage, process/analysis and serving layers.
In this post, I have mapped open-source and popular managed cloud services to make our evaluation process a bit easier.
Welcome to my post on ‘Architecting Modern Data Platforms’, here I will be discussing how to design cutting edge data analytics platforms which meet the ever-evolving data & analytics needs for the business.
https://www.ankitrathi.com
Welcome again to my course, ‘How To Build Your Career In Artificial Intelligence — Do-It-Yourself Course’, which introduces a learning framework for beginners to get into AI.
I introduced the course in the last post. let's have a look at the course outline in this post.
Welcome to my new course, ‘How To Build Your Career In Artificial Intelligence — Do-It-Yourself Course’, which introduces a learning framework for beginners to get into AI.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
24. Data Science is a Team Game
Data Professionals: Job of the Century
25. Mr A alone can’t do everything
Data Professionals: Job of the Century
26. All needs to get involved & contribute
Data Professionals: Job of the Century
27. Hence Data Professionals, not Data Scientists
Data Professionals: Job of the Century
Business
Sponsor
Business
Analyst
Data
Engineer
Cloud
Engineer
DevOps
Engineer
Project
Manager
30. Mr A is a Data Professional
Data Professionals: Job of the Century
31. And so is everyone else in the team
Data Professionals: Job of the Century
32. Expert in their domain, good enough in others
Data Professionals: Job of the Century
Business
Sponsor
Business
Analyst
Data
Engineer
Cloud
Engineer
DevOps
Engineer
Project
Manager
33. Everyone is on same page now
Data Professionals: Job of the Century
34. Mr A is excited again, ready to make a mark
Data Professionals: Job of the Century
35. Business is happy, expecting great results
Data Professionals: Job of the Century