Data engineering Stl Big Data IDEA user groupAdam Doyle
Modern day Data Engineering requires creating reliable data pipelines, architecting distributed systems, designing data stores, and preparing data for other teams.
We’ll describe a year in the life of a Data Engineer who is tasked with creating a streaming data pipeline and touch on the skills necessary to set one up using Apache Spark.
Slides from the April 2019 meeting of the St. Louis Big Data IDEA meetup.
Modified version of Chapter 18 of the book Fundamentals_of_Database_Systems,_6th_Edition with review questions
as part of database management system course
The Anatomy of a Large-Scale Hypertextual Web Search EngineMehul Boricha
It explains the basics of Search Engine and How it Works (Google). It covers the following topics:
+ Introduction
+ Challenges
+ Design Goals
+ PageRank
+ Google Architecture Overview
+ Major Data Structures
Data engineering Stl Big Data IDEA user groupAdam Doyle
Modern day Data Engineering requires creating reliable data pipelines, architecting distributed systems, designing data stores, and preparing data for other teams.
We’ll describe a year in the life of a Data Engineer who is tasked with creating a streaming data pipeline and touch on the skills necessary to set one up using Apache Spark.
Slides from the April 2019 meeting of the St. Louis Big Data IDEA meetup.
Modified version of Chapter 18 of the book Fundamentals_of_Database_Systems,_6th_Edition with review questions
as part of database management system course
The Anatomy of a Large-Scale Hypertextual Web Search EngineMehul Boricha
It explains the basics of Search Engine and How it Works (Google). It covers the following topics:
+ Introduction
+ Challenges
+ Design Goals
+ PageRank
+ Google Architecture Overview
+ Major Data Structures
Indexing is used to speed up access to desired data.
E.g. author catalog in library
A search key is an attribute or set of attributes used to look up records in a file. Unrelated to keys in the db schema.
An index file consists of records called index entries.
An index entry for key k may consist of
An actual data record (with search key value k)
A pair (k, rid) where rid is a pointer to the actual data record
A pair (k, bid) where bid is a pointer to a bucket of record pointers
Index files are typically much smaller than the original file if the actual data records are in a separate file.
If the index contains the data records, there is a single file with a special organization.
Presentation i delievered for jsFoo - Indias First Javascript Conference held in Bangalore on Oct 1. I tried to recreate Mix 11 talk on dataJS. Explained OData and dataJS the open source javascripy library from Microsoft to access OData
C Language Training in Ambala ! Batra Computer Centrejatin batra
Batra Computer Centre is An ISO certified 9001:2008 training Centre in Ambala.
We Provide C Training in Ambala. BATRA COMPUTER CENTRE provides best training in C, C++, S.E.O, Web Designing, Web Development and So many other courses are available.
MongoDB presentation for NYC Python's June meetup. Brief discussion on non-relational databases in general followed by an example of using MongoDB as a blog's backend
My talk at Barcamp Bangalore Spring 2014 on Redis. It talks about what Redis is and its API's. I also talk about its architecture and scaling it up. Also talking about how Adnear is taking advantage of this great tool.
Ch 17 disk storage, basic files structure, and hashingZainab Almugbel
Modified version of Chapter 17 of the book Fundamentals_of_Database_Systems,_6th_Edition with review questions
as part of database management system course
Other content at http://j.mp/bostonpython-continuum
Abstract:
We're pretty obsessed with Python-centric data analytics, and figuring out how to do this well led to the creation of Continuum Analytics. This talk will tell you about three pieces of that puzzle we have developed over the past year: Wakari, a web based analytics platform, leveraging IPython and IPython Notebook; Blaze, a next generation NumPy; and Bokeh, providing interactive data visualization. All are available now for free as services or open source projects.
Indexing is used to speed up access to desired data.
E.g. author catalog in library
A search key is an attribute or set of attributes used to look up records in a file. Unrelated to keys in the db schema.
An index file consists of records called index entries.
An index entry for key k may consist of
An actual data record (with search key value k)
A pair (k, rid) where rid is a pointer to the actual data record
A pair (k, bid) where bid is a pointer to a bucket of record pointers
Index files are typically much smaller than the original file if the actual data records are in a separate file.
If the index contains the data records, there is a single file with a special organization.
Presentation i delievered for jsFoo - Indias First Javascript Conference held in Bangalore on Oct 1. I tried to recreate Mix 11 talk on dataJS. Explained OData and dataJS the open source javascripy library from Microsoft to access OData
C Language Training in Ambala ! Batra Computer Centrejatin batra
Batra Computer Centre is An ISO certified 9001:2008 training Centre in Ambala.
We Provide C Training in Ambala. BATRA COMPUTER CENTRE provides best training in C, C++, S.E.O, Web Designing, Web Development and So many other courses are available.
MongoDB presentation for NYC Python's June meetup. Brief discussion on non-relational databases in general followed by an example of using MongoDB as a blog's backend
My talk at Barcamp Bangalore Spring 2014 on Redis. It talks about what Redis is and its API's. I also talk about its architecture and scaling it up. Also talking about how Adnear is taking advantage of this great tool.
Ch 17 disk storage, basic files structure, and hashingZainab Almugbel
Modified version of Chapter 17 of the book Fundamentals_of_Database_Systems,_6th_Edition with review questions
as part of database management system course
Other content at http://j.mp/bostonpython-continuum
Abstract:
We're pretty obsessed with Python-centric data analytics, and figuring out how to do this well led to the creation of Continuum Analytics. This talk will tell you about three pieces of that puzzle we have developed over the past year: Wakari, a web based analytics platform, leveraging IPython and IPython Notebook; Blaze, a next generation NumPy; and Bokeh, providing interactive data visualization. All are available now for free as services or open source projects.
The SBGrid Science Portal provides multi-modal access to computational infrastructure, data storage, and data analysis tools for the structural biology community. It incorporates features not previously seen in cyberinfrastructure science gateways. It enables researchers to securely share a computational study area, including large volumes of data and active computational workflows. A rich identity management system has been developed that simplifies federated access to US national cyberinfrastructure, distributed data storage, and high performance file transfer tools. It integrates components from the Virtual Data Toolkit, Condor, glideinWMS, the Globus Toolkit and Globus Online, the FreeIPA identity management system, Apache web server, and the Django web framework.
In the mid-1990s, the high-energy physics community (think FermiLab and CERN) started planning for the Large Hadron Collider. Managing the petabytes of data that would be generated by the facility and sharing it with the globally distributed community of over 10,000 researchers would be a major infrastructure and technology problem. This same community that brought us the web has now developed standards, software, and infrastructure for grid computing. In this seminar I'll present some of the exciting science that is being done on the Open Science Grid, the US national cyberinfrastructure linking 60 institutions (Harvard included) into a massive distributed computing and data processing system.
Adapting federated cyberinfrastructure for shared data collection facilities ...Boston Consulting Group
Early stage experimental data in structural biology is generally unmaintained
and inaccessible to the public. It is increasingly believed that this data, which
forms the basis for each macromolecular structure discovered by this field, must
be archived and, in due course, published. Furthermore, the widespread use of
shared scientific facilities such as synchrotron beamlines complicates the issue of
data storage, access and movement, as does the increase of remote users. This
work describes a prototype system that adapts existing federated cyberinfrastructure
technology and techniques to significantly improve the operational
environment for users and administrators of synchrotron data collection
facilities used in structural biology. This is achieved through software from the
Virtual Data Toolkit and Globus, bringing together federated users and facilities
from the Stanford Synchrotron Radiation Lightsource, the Advanced Photon
Source, the Open Science Grid, the SBGrid Consortium and Harvard Medical
School. The performance and experience with the prototype provide a model for
data management at shared scientific facilities.
Stokes-Rees, Ian, Ian Levesque, Frank V. Murphy, Wei Yang, Ashley Deacon, and Piotr Sliz. “Adapting Federated Cyberinfrastructure for Shared Data Collection Facilities in Structural Biology.” Journal of Synchrotron Radiation 19, no. 3 (April 6, 2012). http://scripts.iucr.org/cgi-bin/paper?S0909049512009776.
Working with thousands, millions, or billions of data records in high dimensions is increasingly becoming the reality for scientific research. What are some techniques to make this kind of data volume tractable? How can parallel computing help? In this talk I'll review data management tools and infrastructures, languages, and paradigms that help in this regard. In particular, I'll discuss Hadoop, MapReduce, Python, NumPy, and Globus Online to provide a survey of ways in which researchers can manage their data and process it in parallel.
Apache Arrow Workshop at VLDB 2019 / BOSS SessionWes McKinney
Technical deep dive for database system developers in the Arrow columnar format, binary protocol, C++ development platform, and Arrow Flight RPC.
See demo Jupyter notebooks at https://github.com/wesm/vldb-2019-apache-arrow-workshop
AWS July Webinar Series - Getting Started with Amazon DynamoDBAmazon Web Services
This webinar provides an overview of Amazon DynamoDB, a fast, flexible, and fully managed NoSQL database service for Mobile, Web, AdTech, IOT and Gaming applications that need consistent, single-digit millisecond latency at any scale.The webinar will cover key topics around general architecture of DynamoDB, data types, throughput provisioning, querying and indexing, and recent features.
The webinar includes a live demo of the basic operations used to read and write data to a DynamoDB table, and how the concept of provisioned IO affects the throughput of these operations.
Learning Objectives:
Enable users to understand how DynamoDB works so that they can evaluate and use DynamoDB as the data store for their application
講師: Xiaoyong Han, Solution Architect, AWS
Data collection and storage is a primary challenge for any big data architecture. In this webinar, gain a thorough understanding of AWS solutions for data collection and storage, and learn architectural best practices for applying those solutions to your projects. This session will also include a discussion of popular use cases and reference architectures. In this webinar, you will learn:
• Overview of the different types of data that customers are handling to drive high-scale workloads on AWS, and how to choose the best approach for your workload • Optimization techniques that improve performance and reduce the cost of data ingestion • Leveraging Amazon S3, Amazon DynamoDB, and Amazon Kinesis for storage and data collection
InterPlanetary Linked Data (IPLD) is the data layer for content-addressed systems and the Web3.0. It is a suite of technologies for representing and traversing hash-linked data. In this module you will understand:
- Why IPLD exists
- IPLD’s fundamental concepts, such as Merkle DAGs and Merkle Roots
- The relation of IPLD to IPFS
- How to use IPLD for a distributed data structure.
Apache Spark's Built-in File Sources in DepthDatabricks
In Spark 3.0 releases, all the built-in file source connectors [including Parquet, ORC, JSON, Avro, CSV, Text] are re-implemented using the new data source API V2. We will give a technical overview of how Spark reads and writes these file formats based on the user-specified data layouts. The talk will also explain the differences between Hive Serde and native connectors, and share the experiences of how to tune the connectors and choose the best data layouts for achieving the best performance.
Organizations often need to quickly analyze large amounts of data, such as logs generated from a wide variety of sources and formats. However, traditional approaches require a lot of time and effort designing complex data transformation and loading processes; and configuring data warehouses. Using AWS, you can start querying your datasets within minutes. In this session you will learn how you can deploy a managed Presto environment in minutes to interactively query log data using standard ANSI SQL. Presto is a popular open source SQL engine for running interactive analytic queries against data sources of all sizes. We will talk about common use cases and best practices for running Presto on Amazon EMR.
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Why you should care about data layout in the file system with Cheng Lian and ...Databricks
Efficient data access is one of the key factors for having a high performance data processing pipeline. Determining the layout of data values in the filesystem often has fundamental impacts on the performance of data access. In this talk, we will show insights on how data layout affects the performance of data access. We will first explain how modern columnar file formats like Parquet and ORC work and explain how to use them efficiently to store data values. Then, we will present our best practice on how to store datasets, including guidelines on choosing partitioning columns and deciding how to bucket a table.
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
Sig Narvaez, Executive Solution Architect at MongoDB
MongoDB is now a Developer Data Platform. Come learn what�s new in the 6.0 release and Atlas following all the recent announcements made at MongoDB World 2022. Topics will include
- Atlas Search which combines 3 systems into one (database, search engine, and sync mechanisms) letting you focus on your product's differentiation.
- Atlas Data Federation to seamlessly query, transform, and aggregate data from one or more MongoDB Atlas databases, Atlas Data Lake and AWS S3 buckets
- Queryable Encryption lets you run expressive queries on fully randomized encrypted data to meet the most stringent security requirements
- Relational Migrator which analyzes your existing relational schemas and helps you design a new MongoDB schema.
- And more!
Apache big data 2016 - Speaking the language of Big Datatechmaddy
With the advent of feature based teams, software architecture styles like Microservices and deployment patterns like Devops are taking over. Each team takes autonomous decisions on technologies used, but there is always a need to define a common language for the services to communicate with each other. This way there will be a common wire format and avoid lot of mappers across the application. The other common scenario is in big data projects where the cluster of nodes need to communicate efficiently and effectively, with ease of API.
This talk highlights on Apache Avro and Apache Thrift which are used in Big data solutions -- which act as common language across different services/nodes in big data applications. These technologies act as language and platform neutral way of serializing structured data. This talk also shows examples and demos -- highlighting the pain points they solve.
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Amazon Web Services
In today’s session we will share with you an overview of what the typical challenges when adoption Big Data are, and how the AWS Big Data platform allows you to tackle this challenges and leverage the right Analytical/Big Data solutions in order to become successful with your strategy (Whiteboard presentation)
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
The unprecedented scale at which data is consumed and generated today has shown a large demand for scalable data management and given rise to non-relational, distributed "NoSQL" database systems. Two central problems triggered this process: 1) vast amounts of user-generated content in modern applications and the resulting requests loads and data volumes 2) the desire of the developer community to employ problem-specific data models for storage and querying. To address these needs, various data stores have been developed by both industry and research, arguing that the era of one-size-fits-all database systems is over. The heterogeneity and sheer amount of these systems - now commonly referred to as NoSQL data stores - make it increasingly difficult to select the most appropriate system for a given application. Therefore, these systems are frequently combined in polyglot persistence architectures to leverage each system in its respective sweet spot. This tutorial gives an in-depth survey of the most relevant NoSQL databases to provide comparative classification and highlight open challenges. To this end, we analyze the approach of each system to derive its scalability, availability, consistency, data modeling and querying characteristics. We present how each system's design is governed by a central set of trade-offs over irreconcilable system properties. We then cover recent research results in distributed data management to illustrate that some shortcomings of NoSQL systems could already be solved in practice, whereas other NoSQL data management problems pose interesting and unsolved research challenges.
If you'd like to use these slides for e.g. teaching, contact us at gessert at informatik.uni-hamburg.de - we'll send you the PowerPoint.
Categorisation of databases based on storage of data. Pictorially depicts partitioning and replication of data. Grouping of prominent databases used in the industry.
"In this session, you will learn how to easily access your data on S3, and how to visualize and generate insights from Amazon Athena and other data sources through Amazon QuickSight. In addition we will share some tips & best practices for using Athena & QuickSight.
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
Amazon QuickSight is a fast, cloud-powered business analytics service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from various data sources (Amazon Redshift, Amazon Athena, Amazon EMR, Amazon RDS and more)."
Leading organizations today all have data scientists and analytics teams. A key challenge is establishing cross-functional teams that can collaboratively derive insights from data and move exploratory interactive analytics into automated production systems. Boston Consulting Group, founded on quantitative decision making, guides global F500 companies in the technical and organizational structures that will provide a foundation for agility, innovation, and competitive advantage. This talk will outline key strategies for building effective cloud-native analytics teams.
Leading organizations today all have data scientists and analytics teams. A key challenge is establishing cross-functional teams that can collaboratively derive insights from data and move exploratory interactive analytics into automated production systems. Boston Consulting Group, founded on quantitative decision making, guides global F500 companies in the technical and organizational structures that will provide a foundation for agility, innovation, and competitive advantage. This talk will outline key strategies for building effective cloud-native analytics teams.
Keynote at Gateways 2017 Conference, Ann Arbor MI
Speaker: Ian Stokes-Rees
"Connecting Cyberinfrastructure Back To The Laptop"
Science Gateways today are generally built to provide a web-accessible interface for a particular scientific community to access a combination of software, hardware, and data deployed in an expertly managed computing center. But what happens when the scientist wants to repatriate their data? Or perform some analysis that is not supported by the gateway? Both for the purposes of encouraging innovative workflows and serving an audience with a wide range of computational experience it is important to consider how a gateway can fit into the broader computational ecosystem of a particular researcher or research group. One simple starting point for this is to ask the question "how can the gateway connect back to the laptop?". This talk will consider how this is being done today in science gateways and present some ideas for how this could be expanded in the future.
A talk from AnacondaCON presenting my personal journey from physics to finance to biology and how collaborative team-based data science has been the big enabler. The talk looks at Python, Big Data, Jupyter Notebooks, Anaconda. Discusses CERN LHCb particle physics computing, protein structure determination, and patterns in data science.
Harvard HPC Seminar Series
Theresa Kaltz, PhD, High Performance Technical Computing, FAS, Harvard
Due to the wide availability and low cost of high speed networking, commodity clusters have become the de facto standard for building high performance parallel computing systems. This talk will introduce the leading technology for high speed interconnects called Infiniband and compare its deployment and performance to Ethernet. In addition, some emerging interconnect technologies and trends in cluster networking will be discussed.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
4. Blaze: Different kinds of Arrays
Indexable
Record Type
Primitive Type
NDTable
Deferred
Concrete
NDArray
Deferred
Concrete
5. Blaze Deferred Arrays
• Symbolic objects which build a graph
• Represents deferred computation
+"
A + B*C
Usually what you have when
you have a Blaze Array
A"
*"
B"
C"
6. Deferred allows handling large arrays
Can be handled out-ofcore using chunks to
stream through memory.
7. Blaze Concrete Array
Data Descriptor
Where are the bytes?
DataShape
URL
URL
URL
URL
URL
Indexes
What do the bytes mean?
Extensible Type System
which includes shape
MetaData
Dictionary
Labels, provenance, etc.
15. Advanced Types
Parametrized Types
type SquareMatrix T = N, N, T
type Point = {
x : int;
y : int
}
Alias Types
!
type IntMatrix N = N, N, int32
type Space = {
a: Point;
b: Point
}
!
5, 10, Space
17. Execution Model
• Graphs dispatch to specialized library code
that is “registered with the system” based on
type and meta-data of array (blaze Modules)
• Many operations can be compiled with LLVM
to machine-code
• BLIR (simple typed expression syntax)
• Numba (Python compiler)