How businesses are using Lakehouse? Why should you care about it?

If you’re familiar with the term data warehouse, you’ll know that it’s a system for storing structured data for business intelligence and reporting purposes. However, as businesses have started to appreciate the value of unstructured data, such as images, videos, and voice recordings, a new type of framework called the data lake has emerged. While the data lake is a powerful and flexible infrastructure for storing unstructured data, it lacks certain critical features, such as transaction support and data quality enforcement, leading to data inconsistency.

To address these issues, a hybrid architecture was needed that could store both structured and unstructured data. This led to the development of the data lakehouse, which unifies structured and unstructured data in a single repository. Organizations that work with unstructured data can benefit from having a single data repository instead of requiring both a warehouse and a lake architecture.

Data lakehouses allow for structured and schema just like those used in a data warehouse to be applied to the unstructured data type that is typically stored in a data lake. This enables data users, such as data scientists, to access information more quickly and efficiently. Intelligent metadata layers can also be used to categorize and classify the data, enabling it to be cataloged and indexed like structured data.

Data lakehouses are particularly well-suited to inform data-driven operations and decision-making by organizations that want to move from business intelligence (BI) to artificial intelligence (AI). They are cheaper to scale than data warehouses and can be queried from anywhere using any tool, rather than being limited to applications that can only handle structured data, such as SQL.

As more organizations recognize the value of using unstructured data together with AI and machine learning, data lakehouses are becoming increasingly popular. They represent a step up in maturity from the combined data lake and data warehouse model, and they will likely become simpler, more cost-efficient, and more capable of serving diverse data applications over time.


Last Updated on February 18, 2023 by SK

Weekday Readings – Aug 24, 2022

  • What is Podman? The container engine replacing Docker. Podman is a container engine—a tool for developing, managing, and running containers and container images. Containers are standardized, self-contained software packages that hold all the elements necessary to run anywhere without the need for customization, including application code and supporting libraries.  (InfoWorld)
  • The strange case of Nakamoto’s bitcoin – Part 1. As Bitcoin, and cryptocurrencies in general, claim to be investments, yet have no underlying sources of revenue, many have viewed them with suspicion. Some argue that Bitcoin is a Ponzi, while others counter that the comparison is erroneous as it shares traits with a pyramid scheme. Surprisingly, despite intense scrutiny, Bitcoin has defied precise categorization as a specific form of investment fraud, leading some proponents to suggest that, as a result, it should be cleared of all charges, “If it looks like a duck, buthonks like a goose, then it can’t be either”. (Salbayat-Blog)
  • Google tried to prove managers don’t matter. Instead, it discovered 10 traits of the very best ones. The hypothesis was that the quality of a manager doesn’t matter and that managers are at best a necessary evil, and at worst a useless layer of bureaucracy. The early work of Project Oxygen, in 2002, included a radical experiment — a move to a flat organization without any managers. The experiment was a disaster, lasting only a few months as the search giant found employees were left without direction and guidance on their most basic questions and needs. (Inc.com)
  • Big Tech braces for “big lie” in 2022 midterms. Tech companies were caught flat-footed by the deluge of disinformation aimed at delegitimizing the election process and outcome in 2020. Now, amid intense regulatory scrutiny, they are trying to get ahead of a repeat.(Axios)
  • You have no idea how good mosquitoes are at smelling us. Their recent work shows that mosquitoes’ odor-detecting systems are, unlike many other animals’, patchwork, chaotic, and riddled with fail-safes that make the insects’ sense of smell extraordinarily difficult to stump. It’s an essential adaptation for a creature that is hyper-focused on us. (The Atlantic)

Last Updated on August 25, 2022 by SK

Headless Plex Client using HiFiBerry and Raspberry Pi 3

Plex media server is a great way to manage personal media collection (mp3 music, family videos and photos). I’ve been using Plex Media Server running in a Synology NAS for quite sometime to manage a sizeable collection of music, videos and photos.

I stream my personal collection, especially some of my favorite Indian musicians through the living room stereo and bedroom stereo (powered by a Class T Amplifier (Trends Audio Class-T TA 10.1) connected to a pair of Axiom speakers). This stereo system was running “Rasplex” for quite sometime on a Raspberry Pi with HifiBerry DAC+ Pro board, but lately been experiencing issues and bugs that’s been frustrating. And was looking for various alternatives that’ll seamlessly work with Plex to stream audio to my stereo. Recently, stumbled across a project promoted/implemented by Plex CTO to run PlexAmp run in Raspberry Pi.

This guide shows how I setup a headless PlexAmp using the latest build from Plex.

Hardware

HifiBerry Board

Install Ubuntu Server in Raspberry Pi

  • Install Ubuntu 11 (bullseye) in server mode, no graphical display.
    • Follow the steps in this page to install Ubuntu Server in Raspberry Pi 3 (Link) or here .
  • install ssh server in the ubuntu server.
    • sudo apt-get install openssh-server
    • and enable it by running following (so that ssh server will start when the system reboots):
      • sudo systemctl enable –now ssh

Configure and Enable HifiBerry Board

  • Follow instructions here to configure and enable Hifiberry board in ubuntu.
    • Hifiberry drivers are already included in the Linux Kernel for Raspberry Pi OS. So you just need to follow the instructions below.
    • vi /boot/config.txt file
      • comment out “dtparam=audio=on” line (Basically put a “#” in front of the line)
      • add audio=off to the “Enable DRM VC4 V3D driver” section
        • “dtoverlay=vc4-kms-v3d, audio=off” (I guess, this disables built-in driver!)
      • add following lines  to the same file.
        • dtoverlay=hifiberry-dacplus
        • force_eeprom_read=0
  • Then created /etc/asound.conf file with following:
                   pcm.!default {
                      type hw card 0
                   }
                   ctl.!default {
                      type hw card 0
                   }
  • Reboot “reboot”
  • run “aplay -l”
    • You should see following output:
    • card 0: sndrpihifiberry….

Install PlexAmp

Now that we’ve installed OS and HifiBerry is enabled and configured, next step is to install and enable PlexAmp client. As far I can tell, there is no guide to cleanly install and debug the client. I believe this software is still actively developed, but I was able to get it working with simple tweaks. Following is what I did:

  • To install PlexAmp (In server mode with no graphics) I followed instructions here.
  • first install nodejs server
    • sudo apt install nodejs
  • Download latest headless PlexAmp client for Raspberry Pi
  • Untar the file
    • tar -xvf Plexamp….tar.bz2
  • Go to “plexamp” directory
    • cd plexamp
  • Now run the node webserver
    • node js/index.js
    • Above step will start the PlexAmp client scripts that are part of the nodejs server.
  • At this point make sure you can see the “Raspberry PI” (screenshot below) in your iPhone or Android Plex client.
Check to make sure “hifiberry” shows up in the clients.
  • Once you confirm the PlexAmp client shows up in the iPhone app, you know the install and configuration is successful.
  • Next step is to make sure the PlexAmp client starts automatically when Raspberry OS bootsup. For doing that, you need to run PlexaAmp as a service.

sudo cp plexamp.service /lib/systemd/system/
sudo systemctl daemon-reload 
sudo systemctl enable plexamp
sudo systemctl start plexamp

That’s it.. you should be able to stream audio from your Plex app in iPhone, select PlexAmp as the client to play audio through your home stereo!. Enjoy.

If you are curious.. here is my setup:


Last Updated on August 25, 2022 by SK

Weekend Readings – Aug 21,2022

  • A frustrating hassle holding electric cars back: Broken Chargers. This definitely happened to me personally. Can’t reliably identify working chargers to charge my Nissan Leaf. It surely is frustrating to drive to a remote location using the apps provided by charging networks and find that the charging station either doesn’t work, and in worst case the entire charging station is “sold” to a new charging company. (NYT)
  • Drinking the kool-Air (How billions were lost creating PDAs). Why did so many invest so much money and time into the development of PDAs?  It’s a fascinating question.  Besides the key notion of portability, computers were considered to difficult to use, which restricted the size of the market.  The PDA was meant to be as easy to use as paper but as powerful as a computer; humans would interact with the device through a pen on a small screen. Handwriting was considered a more natural way to interact especially because so few actually knew how to type.(Two Thirds Done – Blog)
  • The Mysterious Dance of the Cricket Embryos. Humans, frogs and many other widely studied animals start as a single cell that immediately divides again and again into separate cells. In crickets and most other insects, initially just the cell nucleus divides, forming many nuclei that travel throughout the shared cytoplasm and only later form cellular membranes of their own. (NYT)
  • How do I become data scientist? Our educational institutions trained us to think that’s how you learn things. It might eventually work, too — but it’s a unnecessarily inefficient process. Some programs have capstone projects (often using curated, clean data sets with a clear purpose, which sounds good but it’s not). Many recognize there’s no substitute for ‘learning on the job’ — but how do you get that data science job in the first place? (Monica Rogati)
  • The coming California Megastorm. Unlike a giant earthquake, the other “Big One” threatening California, an atmospheric river superstorm will not sneak up on the state. Forecasters can now spot incoming atmospheric rivers five days to a week in advance, though they don’t always know exactly where they’ll hit or how intense they’ll be. (NYT)

Last Updated on August 21, 2022 by SK

Weekend Readings: Dec 12,2021

  • Apple’s Long Journey to the M1 Pro Chip. Apple’s M1 Pro/Max is the second step in a major change in computing. What might be seen as an evolution from iPhone/ARM is really part of an Apple story that began in 1991 with PowerPC. And what a story of innovation. (learningbyshipping)
  • Is this how your brain works? Machine learning has incredible promise. I believe that in the coming decades we will produce machines that have the kind of broad, flexible “general intelligence” that would enable them to help us address truly complex, multifaceted challenges like improving medicine through a more advanced understanding of how proteins fold. Nothing we call AI today has anything like that kind of intelligence. (GatesNotes).
  • In a First, Physicists Glimpse a Quantum Ghost. A wave function is not something one can hold in their hand or put under a microscope. And confusingly, some of its properties simply seem not to be real. In fact, mathematicians would openly label them as imaginary: so-called imaginary numbers—which arise from seemingly nonsensical feats such as taking the square roots of negative integers—are an important ingredient of a wave function’s well-proved power to forecast the results of real-world experiments. In short, if a wave function can be said to “exist” at all, it does so at the hazy crossroads between metaphysical mathematics and physical reality. (Scientific American).
  • Addressing the structural foundations of homelessness in the Bay Area. The severity of the Bay Area’s homelessness crisis is visible everywhere—from the tents that crowd under freeways to the increasing number of people sleeping on sidewalks and in doorways. Largely hidden from view, however, are the 457,000 extremely low-income (ELI) households in the region who are making ends meet on an average of $18,000 a year. Over half of ELI households are precariously housed, meaning that they don’t receive any housing assistance and pay more than 30 percent of their income for housing. These households—which include seniors living on fixed incomes, single parents juggling work and child care responsibilities, and essential workers making poverty wages—are at significant risk of housing insecurity and homelessness. (Berkeley blog).
  • The futuristic plan to fix America’s power grid. One of the most important fixes would be physically “hardening” the grid, which means replacing old infrastructure that’s vulnerable to extreme weather with stronger, more resilient upgrades. These are the kinds of solutions you might notice if they pop up in your neighborhood, perhaps in the form of swapping out wooden electric poles for wind-resistant steel or concrete ones, moving power lines underground, or lifting ground-level transformers out of the path of potential floods.  (Recode)


Last Updated on December 12, 2021 by SK

Mid-week Readings. Dec 8, 2021

  • Everyone Is Talking About Data Science. Here’s How J.P. Morgan Is Putting It Into Practice. Paul Quinsee, J.P. Morgan Asset Management’s global head of equities, thought he knew the skills that turned analysts into stars. Like the talent scouts in Money Ball, Michael Lewis’s bestselling book on how data science changed baseball, Quinsee had been watching fundamental research analysts play their game — albeit in less dusty fields — for almost four decades. (Institutional Investor).
  • The Dark Side of 15-Minute Grocery Delivery. Over the last year, cities across the U.S. and Europe have seen a rapid rise in the number of dark stores — mini-warehouses stocked with groceries to be delivered in 15 minutes or less. Operated by well-funded startups such as GetirGopuffJokr and Gorillas, dark stores are quietly devouring retail spaces, transforming them into minimally staffed distribution centers closed to the public. In New York City, where seven of these services are currently competing for market share (including new entrant DoorDash), these companies have occupied dozens of storefronts since July, with expansion plans calling for hundreds more in that city alone. (Bloomberg)
  • Can Apple Take Down the World’s Most Notorious Spyware Company? If Apple were to win this case, it would deal a strong blow against malicious spyware operators, state-sponsored hacking, and the global oppression of democracy activists. However, if defendants were to somehow prevail, it could send a signal that we have entered a new age in which technological pirates are free to run amok without fear of judicial intervention (Slate)
  • Why you should care about Facebook’s big push into the metaverse. Many critics and skeptics have mocked Zuckerberg’s plan to change Facebook from a social media company to a metaverse company. Some critics say that by focusing on the metaverse and renaming itself while the company is reeling from a PR crisis, Facebook is distracting from the problems it creates or contributes to in the real world: issues like harming teens’ mental health, facilitating the spread of disinformation, and fueling political polarization. (Vox)

Last Updated on December 8, 2021 by SK