Der Vortrag gibt zunächst einen Architektur-Überblick zu den UIA-Komponenten und deren Zusammenspiel. Anhand eines Use Cases wird vorgestellt, wie im "UIA Data Reservoir" einerseits kostengünstig aktuelle Daten "as is" in einem Hadoop File System (HDFS) und andererseits veredelte Daten in einem Oracle 12c Data Warehouse miteinander kombiniert oder auch per Direktzugriff in Oracle Business Intelligence ausgewertet bzw. mit Endeca Information Discovery auf neue Zusammenhänge untersucht werden.
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
A look at common patterns being applied to leverage Hadoop with traditional data management systems and the emerging landscape of tools which provide access and analysis of Hadoop data with existing systems such as data warehouses, relational databases, and business intelligence tools.
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
"Amr Awadallah served as the VP of Engineering of Yahoo's Product
Intelligence Engineering (PIE) team for a number of years. The PIE
team was responsible for business intelligence and advanced data
analytics across a number of Yahoo's key consumer facing properties (search, mail, news, finance, sports, etc). Amr will share the data architecture that PIE had implementted before Hadoop was deployed and the headaches that architecture entailed. Amr will then show how most, if not all of these headaches were eliminated once Hadoop was deployed. Amr will illustrate how Hadoop and Relational Database complement each other within the traditional business intelligence data stack, and how that enables organizations to access all their data under different
operational and economic constraints."
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Cloudera, Inc.
Analyzing new and diverse digital data streams can reveal new sources of economic value, provide fresh insights into customer behavior and identify market trends early on. But this influx of new data can create challenges for IT departments. To derive real business value from Big Data, you need the right tools to capture and organize a wide variety of data types from different sources, and to be able to easily analyze it within the context of all your enterprise data. Attend this session to learn how Oracle’s end-to-end value chain for Big Data can help you unlock the value of Big Data.
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
A look at common patterns being applied to leverage Hadoop with traditional data management systems and the emerging landscape of tools which provide access and analysis of Hadoop data with existing systems such as data warehouses, relational databases, and business intelligence tools.
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
"Amr Awadallah served as the VP of Engineering of Yahoo's Product
Intelligence Engineering (PIE) team for a number of years. The PIE
team was responsible for business intelligence and advanced data
analytics across a number of Yahoo's key consumer facing properties (search, mail, news, finance, sports, etc). Amr will share the data architecture that PIE had implementted before Hadoop was deployed and the headaches that architecture entailed. Amr will then show how most, if not all of these headaches were eliminated once Hadoop was deployed. Amr will illustrate how Hadoop and Relational Database complement each other within the traditional business intelligence data stack, and how that enables organizations to access all their data under different
operational and economic constraints."
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Cloudera, Inc.
Analyzing new and diverse digital data streams can reveal new sources of economic value, provide fresh insights into customer behavior and identify market trends early on. But this influx of new data can create challenges for IT departments. To derive real business value from Big Data, you need the right tools to capture and organize a wide variety of data types from different sources, and to be able to easily analyze it within the context of all your enterprise data. Attend this session to learn how Oracle’s end-to-end value chain for Big Data can help you unlock the value of Big Data.
The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of interactive SQL queries with the capacity, scalability, and flexibility of a Hadoop cluster. In this webinar, join Cloudera and MicroStrategy to learn how Impala works, how it is uniquely architected to provide an interactive SQL experience native to Hadoop, and how you can leverage the power of MicroStrategy 9.3.1 to easily tap into more data and make new discoveries.
Hadoop-DS: Which SQL-on-Hadoop Rules the HerdIBM Analytics
Originally Published on Oct 27, 2014
An overview of IBM's audited Hadoop-DS comparing IBM Big SQL, Cloudera Impala and Hortonworks Hive for performance and SQL compatibility. For more information, visit: http://www-01.ibm.com/software/data/infosphere/hadoop/
Innovate Analytics with Oracle Data Mining & Oracle RCapgemini
The “big data” buzz has garnered a lot of interest lately. Many assume that big data and discovering newfound insights about currently collected data involves investment and skilling up on new tools. Truth be told, you may already have the tools needed to tap into your big data potential.
In this webinar, learn how:
- Oracle Data Mining and Oracle R can generate insights without the hype or large investment in big data products
- Oracle Data Mining integrates seamlessly with Oracle Business Intelligence Enterprise Edition (OBIEE)
- Statistical analytics can be achieved by using Oracle R
Come away with insights on real-world scenarios to implement quick wins at your organization.
http://www.capgemini.com/oracle
This talk was held at the 11th meeting on April 7 2014 by Marcel Kornacker.
Impala (impala.io) raises the bar for SQL query performance on Apache Hadoop. With Impala, you can query Hadoop data – including SELECT, JOIN, and aggregate functions – in real time to do BI-style analysis. As a result, Impala makes a Hadoop-based enterprise data hub function like an enterprise data warehouse for native Big Data.
Brief Introduction about Hadoop and Core Services.Muthu Natarajan
I have given quick introduction about Hadoop, Big Data, Business Intelligence and other core services and program involved to use Hadoop as a successful tool for Big Data analysis.
My true understanding in Big-Data:
“Data” become “information” but now big data bring information to “Knowledge” and ‘knowledge” becomes “Wisdom” and “Wisdom” turn into “Business” or “Revenue”, All if you use promptly & timely manner
Introduction to Apache Hadoop. Includes Hadoop v.1.0 and HDFS / MapReduce to v.2.0. Includes Impala, Yarn, Tez and the entire arsenal of projects for Apache Hadoop.
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
Big Data and advanced analytics are critical topics for executives today. But many still aren't sure how to turn that promise into value. This presentation provides an overview of 16 examples and use cases that lay out the different ways companies have approached the issue and found value: everything from pricing flexibility to customer preference management to credit risk analysis to fraud protection and discount targeting. For the latest on Big Data & Advanced Analytics: http://mckinseyonmarketingandsales.com/topics/big-data
Tame Big Data with Oracle Data IntegrationMichael Rainey
In this session, Oracle Product Management covers how Oracle Data Integrator and Oracle GoldenGate are vital to big data initiatives across the enterprise, providing the movement, translation, and transformation of information and data not only heterogeneously but also in big data environments. Through a metadata-focused approach for cataloging, defining, and reusing big data technologies such as Hive, Hadoop Distributed File System (HDFS), HBase, Sqoop, Pig, Oracle Loader for Hadoop, Oracle SQL Connector for Hadoop Distributed File System, and additional big data projects, Oracle Data Integrator bridges the gap in the ability to unify data across these systems and helps deliver timely and trusted data to analytic and decision support platforms.
Co-presented with Alex Kotopoulis at Oracle OpenWorld 2014.
How pig and hadoop fit in data processing architectureKovid Academy
Pig, developed by Yahoo research in 2006, enables programmers to write data transformation programs for Hadoop quickly and easily without the cost and complexity of map-reduce programs.
The Apache Hadoop software library is essentially a framework that allows for the distributed processing of large datasets across clusters of computers using a simple programming model. Hadoop can scale up from single servers to thousands of machines, each offering local computation and storage.
The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of interactive SQL queries with the capacity, scalability, and flexibility of a Hadoop cluster. In this webinar, join Cloudera and MicroStrategy to learn how Impala works, how it is uniquely architected to provide an interactive SQL experience native to Hadoop, and how you can leverage the power of MicroStrategy 9.3.1 to easily tap into more data and make new discoveries.
Hadoop-DS: Which SQL-on-Hadoop Rules the HerdIBM Analytics
Originally Published on Oct 27, 2014
An overview of IBM's audited Hadoop-DS comparing IBM Big SQL, Cloudera Impala and Hortonworks Hive for performance and SQL compatibility. For more information, visit: http://www-01.ibm.com/software/data/infosphere/hadoop/
Innovate Analytics with Oracle Data Mining & Oracle RCapgemini
The “big data” buzz has garnered a lot of interest lately. Many assume that big data and discovering newfound insights about currently collected data involves investment and skilling up on new tools. Truth be told, you may already have the tools needed to tap into your big data potential.
In this webinar, learn how:
- Oracle Data Mining and Oracle R can generate insights without the hype or large investment in big data products
- Oracle Data Mining integrates seamlessly with Oracle Business Intelligence Enterprise Edition (OBIEE)
- Statistical analytics can be achieved by using Oracle R
Come away with insights on real-world scenarios to implement quick wins at your organization.
http://www.capgemini.com/oracle
This talk was held at the 11th meeting on April 7 2014 by Marcel Kornacker.
Impala (impala.io) raises the bar for SQL query performance on Apache Hadoop. With Impala, you can query Hadoop data – including SELECT, JOIN, and aggregate functions – in real time to do BI-style analysis. As a result, Impala makes a Hadoop-based enterprise data hub function like an enterprise data warehouse for native Big Data.
Brief Introduction about Hadoop and Core Services.Muthu Natarajan
I have given quick introduction about Hadoop, Big Data, Business Intelligence and other core services and program involved to use Hadoop as a successful tool for Big Data analysis.
My true understanding in Big-Data:
“Data” become “information” but now big data bring information to “Knowledge” and ‘knowledge” becomes “Wisdom” and “Wisdom” turn into “Business” or “Revenue”, All if you use promptly & timely manner
Introduction to Apache Hadoop. Includes Hadoop v.1.0 and HDFS / MapReduce to v.2.0. Includes Impala, Yarn, Tez and the entire arsenal of projects for Apache Hadoop.
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
Big Data and advanced analytics are critical topics for executives today. But many still aren't sure how to turn that promise into value. This presentation provides an overview of 16 examples and use cases that lay out the different ways companies have approached the issue and found value: everything from pricing flexibility to customer preference management to credit risk analysis to fraud protection and discount targeting. For the latest on Big Data & Advanced Analytics: http://mckinseyonmarketingandsales.com/topics/big-data
Tame Big Data with Oracle Data IntegrationMichael Rainey
In this session, Oracle Product Management covers how Oracle Data Integrator and Oracle GoldenGate are vital to big data initiatives across the enterprise, providing the movement, translation, and transformation of information and data not only heterogeneously but also in big data environments. Through a metadata-focused approach for cataloging, defining, and reusing big data technologies such as Hive, Hadoop Distributed File System (HDFS), HBase, Sqoop, Pig, Oracle Loader for Hadoop, Oracle SQL Connector for Hadoop Distributed File System, and additional big data projects, Oracle Data Integrator bridges the gap in the ability to unify data across these systems and helps deliver timely and trusted data to analytic and decision support platforms.
Co-presented with Alex Kotopoulis at Oracle OpenWorld 2014.
How pig and hadoop fit in data processing architectureKovid Academy
Pig, developed by Yahoo research in 2006, enables programmers to write data transformation programs for Hadoop quickly and easily without the cost and complexity of map-reduce programs.
The Apache Hadoop software library is essentially a framework that allows for the distributed processing of large datasets across clusters of computers using a simple programming model. Hadoop can scale up from single servers to thousands of machines, each offering local computation and storage.
Défis et opportunités d'une mise en œuvre conjointe e-Government et Open Gov...Mohamed Said Ouerghi
Eléments de discussion lors du panel ayant pour thème : "Défis et opportunités d'une mise en œuvre conjointe e-Government et Open Government" organisé lors du séminaire Tunisia Smart Gov 2020 qui a eu lieu à Tunis/ia le 1er décembre 2016.
“The increasing role of e-government in promoting inclusive and participatory development has gone hand-in-hand with the growing demands for transparency and accountability in all regions of the world,” said Sha Zukang, UN DESA Under-Secretary-General in the newly released United Nations E-government Survey 2012.
Considerations for an Effective Internal Model Method Implementationaccenture
In this Accenture Finance & Risk presentation we discuss an approach banks can use to develop, manage, and monitor a robust and effective Internal Model Method program. Learn more about the Accenture Finance & Risk Practice: bit.ly/2j2JD6X
Enabling the digital mind shift in the organisation - Enterprise Digital Summ...David Terrar
3rd of 3 opening keynotes at the 2015 Enterprise Digital Summit London - Stowe Boyd's gave us ideas about the future of the org, Euan Semple made it personal, and I added a bit of practical. Three key words for the presentation - Disruption. Reinvention. Education. Everyone's talking digital and it's dangerous... too dangerous to dilute the term, but crucially important that we understand it properly. Digital is becoming a synonym for technology or new or new technology. You need to understand the digital enterprise wave - the current disruptive landscape. Then here are 8 building blocks for transformation, and then our 7E approach to implementing change. Finally I echo Michael Corleone telling Sonny "it's not personal, it's business" with our version "it's not digital, it's business".
The agile enterprise - Digital Transformation as a practical applicationdie.agilen GmbH
The buzzword "digital transformation" is all the rage and will trigger the largest industrial revolution since more than a century for sure. But what does this mean in concrete terms? How will the change look like that companies have to fulfill? We will not only have a look at the 10 dimensions of the "Digital Maturity Level Model", which indicates how mature a company is in terms of the „digital age“ but on concrete practical oriented methods and processes of the digital transformation like Scrum, Kanban, Design Thinking, Lean Startup, LEGO SERIOUS PLAY, OKR and many more as well. At the end of the transformation there is a new, converted corporate form - the agile enterprise.
Zinnov examines the growing trend of enterprises setting up digital labs to drive the next leg of their digital journey. Geographies with rich product development capabilities and a talent pool with key skills are emerging as hot spots for the establishment of innovative digital labs
Financial Services - New Approach to Data Management in the Digital Eraaccenture
How current is your data management strategy? As technology—and the requirements and business drivers around it—changes, financial services firms will need to change their approach to data management. To guide your approach, see the three building blocks to Accenture’s data management framework covered in this presentation.
The Digital Enterprise - Alfresco Summit Keynote 2014John Newton
US Fed Reserve says that productivity growth been declining in the 21st Century and IT has not necessarily been the solution. In Europe, growth has stalled completely and economies are facing the prospect of deflation. Business and operational models from the 20th Century no longer scale as we face exponential growth in information, activity and connections. We no longer give workers the scope or space to get work done. Waste and ad hoc process are killing us. We must reorganise how the company works and the way we do work. We must eliminate the waste of unnecessary paperwork, busy work and communication by digitising, automating and measuring that work into a Digital Enterprise. We must allow The Digital Enterprise will integrate information, processes, work and people to collaborate more efficiently and effectively to produce more valuable products and services.
Hortonworks Oracle Big Data Integration Hortonworks
Slides from joint Hortonworks and Oracle webinar on November 11, 2014. Covers the Modern Data Architecture with Apache Hadoop and Oracle Data Integration products.
Mr. Slim Baltagi is a Systems Architect at Hortonworks, with over 4 years of Hadoop experience working on 9 Big Data projects: Advanced Customer Analytics, Supply Chain Analytics, Medical Coverage Discovery, Payment Plan Recommender, Research Driven Call List for Sales, Prime Reporting Platform, Customer Hub, Telematics, Historical Data Platform; with Fortune 100 clients and global companies from Financial Services, Insurance, Healthcare and Retail.
Mr. Slim Baltagi has worked in various architecture, design, development and consulting roles at.
Accenture, CME Group, TransUnion, Syntel, Allstate, TransAmerica, Credit Suisse, Chicago Board Options Exchange, Federal Reserve Bank of Chicago, CNA, Sears, USG, ACNielsen, Deutshe Bahn.
Mr. Baltagi has also over 14 years of IT experience with an emphasis on full life cycle development of Enterprise Web applications using Java and Open-Source software. He holds a master’s degree in mathematics and is an ABD in computer science from Université Laval, Québec, Canada.
Languages: Java, Python, JRuby, JEE , PHP, SQL, HTML, XML, XSLT, XQuery, JavaScript, UML, JSON
Databases: Oracle, MS SQL Server, MYSQL, PostreSQL
Software: Eclipse, IBM RAD, JUnit, JMeter, YourKit, PVCS, CVS, UltraEdit, Toad, ClearCase, Maven, iText, Visio, Japser Reports, Alfresco, Yslow, Terracotta, Toad, SoapUI, Dozer, Sonar, Git
Frameworks: Spring, Struts, AppFuse, SiteMesh, Tiles, Hibernate, Axis, Selenium RC, DWR Ajax , Xstream
Distributed Computing/Big Data: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, HBase, R, RHadoop, Cloudera CDH4, MapR M7, Hortonworks HDP 2.1
Modern data management using Kappa and streaming architectures, including discussion by EBay's Connie Yang about the Rheos platform and the use of Oracle GoldenGate, Kafka, Flink, etc.
Transform Your Business with Big Data and Hortonworks Pactera_US
Customer insight and marketplace predictions are a few of the profitable benefits found in big data technology. Leading companies are using the advanced analytics solution to find new revenue streams, increase customer satisfaction and optimize the supply chain.
This is an in-depth look at the future of data warehouses and how SQL-on-Hadoop technologies play a pivotal role in those settings.
Matt Aslett, Research Director for 451 Research, is joined by Apache Drill architect Jacques Nadeau to share what lies ahead for enterprise data warehouse architects and BI users in 2015 and beyond.
Strata 2015 presentation from Oracle for Big Data - we are announcing several new big data products including GoldenGate for Big Data, Big Data Discovery, Oracle Big Data SQL and Oracle NoSQL
One of my old presentation to our management covers the following topics
History and Milestones
Traditional Data Warehouse
Key trends breaking the traditional data warehouse
Modern Data Warehouse
Multiple parallel processing (MPP) architecture
Hadoop Ecosystem
Technical Innovation on Hadoop
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
Big Data Processing Above and Beyond Hadoop: Data-intensive computing represents a new computing paradigm to address Big Data processing requirements using high-performance architectures supporting scalable parallel processing to allow government, commercial organizations, and research environments to process massive amounts of data and implement new applications previously thought to be impractical or infeasible. The fundamental challenges of data-intensive computing are managing and processing exponentially growing data volumes, significantly reducing associated data analysis cycles to support practical, timely applications, and developing new algorithms which can scale to search and process massive amounts of data. The open source HPCC (High-Performance Computing Cluster) Systems platform offers a unified approach to Big Data processing requirements: (1) a scalable, integrated computer systems hardware and software architecture designed for parallel processing of data-intensive computing applications, and (2) a new programming paradigm in the form of a high-level, declarative, data-centric programming language designed specifically for big data processing. This presentation explores the challenges of data-intensive computing from a programming perspective, and describes the ECL programming language and the HPCC architecture designed for data-intensive computing applications. HPCC is an alternative to the Hadoop platform, and ECL is compared to Pig Latin, a high-level language developed for the Hadoop MapReduce architecture.
Similar to Oracle Unified Information Architeture + Analytics by Example (20)
Actionable Insights with AI - Snowflake for Data ScienceHarald Erb
Talk @ ScaleUp 360° AI Infrastructures DACH, 2021: Data scientists spend 80% and more of their time searching for and preparing data. This talk explains Snowflake’s Platform capabilities like near-unlimited data storage and instant and near-infinite compute resources and how the platform can be used to seamlessly integrate and support the machine learning libraries and tools data scientists rely on.
From the Data Work Out event:
Performant and scalable Data Science with Dataiku DSS and Snowflake
Managing the whole process of setting up a machine learning environment from end-to-end becomes significantly easier when using cloud-based technologies. The ability to provision infrastructure on demand (IaaS) solves the problem of manually requesting virtual machines. It also provides immediate access to compute resources whenever they are needed. But that still leaves the administrative overhead of managing the ML software and the platform to store and manage the data.
A fully managed end-to-end machine learning platform like Dataiku Data Science Studio (DSS) that enables data scientists, machine learning experts, and even business users to quickly build, train and host machine learning models at scale, needs to access data from many different sources and can also access data provided by Snowflake. Storing data in Snowflake has three significant advantages: a single source of truth, shorten the data preparation cycle, scale as you go.
Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb
Until recently, advancements in data warehousing and analytics were largely incremental. Small innovations in database design would herald a new data warehouse every
2-3 years, which would quickly become overwhelmed with rapidly increasing data volumes. Knowledge workers struggled to access those databases with development intensive BI tools designed for reporting, rather than exploration and sharing. Both databases and BI tools were strained in locally hosted environments that were inflexible to growth or change.
Snowflake and Tableau represent a fundamentally different approach. Snowflake’s multi-cluster shared data architecture was designed for the cloud and to handle logarithmically larger data volumes at blazing speed. Tableau was made to foster an interactive approach to analytics, freeing knowledge workers to use the speed of Snowflake to their greatest advantage.
Machine Learning - Eine Challenge für ArchitektenHarald Erb
Aufgrund vielfältiger potenzieller Geschäftschancen, die Machine Learning bietet, starten viele Unternehmen Initiativen für datengetriebene Innovationen. Dabei gründen sie Analytics-Teams, schreiben neue Stellen für Data Scientists aus, bauen intern Know-how auf und fordern von der IT-Organisation eine Infrastruktur für "heavy" Data Engineering & Processing samt Bereitstellung einer Analytics-Toolbox ein. Für IT-Architekten warten hier spannende Herausforderungen, u.a. bei der Zusammenarbeit mit interdisziplinären Teams, deren Mitglieder unterschiedlich ausgeprägte Kenntnisse im Bereich Machine Learning (ML) und Bedarfe bei der Tool-Unterstützung haben.
Do you know what k-Means? Cluster-Analysen Harald Erb
Cluster-Analysen sind heute "Brot und Butter"-Analysetechniken mit Verfahren, die zur Entdeckung von Ähnlichkeitsstrukturen in (großen) Datenbeständen genutzt werden, mit dem Ziel neue Gruppen in den Daten zu identifizieren. Der K-Means-Algorithmus ist dabei einer der einfachsten und bekanntesten unüberwachten Lernverfahren, das in verschiedenen Machine Learning Aufgabenstellung einsetzbar ist. Zum Beispiel können abnormale Datenpunkte innerhalb eines großen Data Sets gefunden, Textdokumente oder Kunden¬segmente geclustert werden. Bei Datenanalysen kann die Anwendung von Cluster-Verfahren ein guter Einstieg sein bevor andere Klassifikations- oder Regressionsmethoden zum Einsatz kommen.
In diesem Talk wird der K-Means Algorithmus samt Erweiterungen und Varianten nicht im Detail betrachtet und ist stattdessen eher als ein Platzhalter für andere Advanced Analytics-Verfahren zu verstehen, die heute „intelligente“ Bestandteile in modernen Softwarelösungen sind bzw. damit kombiniert werden können. Anhand von zwei Kurzbeispielen wird live gezeigt: (1) Identifizierung von Kunden-Cluster mit einem Big Data Discovery Tool und Python (Jupyter Notebook) und (2) die Realisierung einer Anomalieerkennung direkt im Echtzeitdatenstrom mit einer Stream Analytics Lösung von Oracle.
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?Harald Erb
Talk held at DOAG 2016 conference (2016.doag.org/de/home) discussing a data lab concept incl.architecture blueprint, collaboration and tool examples based on Oracle solutions like Oracle Big Data Discovery (in combination with Jupyter Notebook)
Big Data Discovery + Analytics = Datengetriebene Innovation!Harald Erb
Vortrag von der DOAG 2015-Konferenz: Die Umsetzung von Datenprojekten muss man nicht zwangsläufig den sog. Data Scientists allein überlassen werden. Daten- und Tool-Komplexität im Umgang mit Big Data sind keine unüberwindbaren Hürden mehr für die Teams, die heute im Unternehmen bereits für Aufbau und Bewirtschaftung des Data Warehouses sowie dem Management bzw. der Weiterentwicklung der Business Intelligence-Plattform zuständig sind. In einem interdisziplinären Team bringen neben den technischen Rollen auch Fachanwender und Business Analysten von Anfang an ihr Domänenwissen in das Datenprojekt mit ein,
Oracle Big Data Discovery working together with Cloudera Hadoop is the fastest way to ingest and understand data. Powerful data transformation capabilities mean that data can quickly be prepared for consumption by the extended organisation.
DOAG News 2012 - Analytische Mehrwerte mit Big DataHarald Erb
Seit einigen Monaten wird „Big Data“ intensiv aber auch kontrovers diskutiert. Stellt dieser Ansatz die bestehende relationale Datenbankdominanz in Frage, zumindest für ausgewählte analytische Problemstellungen? Dieser Artikel zeigt nach einem einführenden Überblick anhand von Anwendungsfällen auf, wo die geschäftlichen Mehrwerte von Big Data Projekten liegen und wie diese neuen Erkenntnisse in die bestehenden Data Warehouse und Business Intelligence Projekte integriert werden können.
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...Harald Erb
Das einzig Beständige ist der Wandel: Kritische Informationen, die Unternehmen täglich als Entscheidungsgrundlage benötigen, unterliegen der permanenten Veränderung und sind noch dazu über viele interne und externe Quellen verteilt. Sei es in Dokumenten, E-Mails, auf Portalen und Websites, etc. – überall finden sich relevante Daten, die wertvolle Erkenntnisse für fundierte Geschäftsentscheidungen liefern können.
Technisch betrachtet müssen die zum Teil sehr schwer zugänglichen Informationen zunächst einmal von den verteilten Anwendungen und Datenquellen beschafft werden bevor die eigentliche Weiterverarbeitung im Data Warehouse stattfindet. Als graphisches Entwicklungswerkzeug setzt das Endeca Web Acquisition Toolkit (Endeca WAT) genau an diesem Punkt an, indem es das Erstellen synthetischer Schnittstellen ermöglicht. Z.B. sollen von einer kommerziellen Website Preisdaten und/oder Kundenbewertungen akquiriert werden, für die der Website-Betreiber keine API bereitstellt. Der nachfolgende Artikel bzw. Vortrag skizziert, wie das Endeca Web Acquisition Toolkit Integrationsaufgaben zur Anbindung externer Datenquellen im Rahmen der aktuellen Oracle Information Management Reference Architecture übernehmen kann
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas