How businesses are using Lakehouse? Why should you care about it?

If you’re familiar with the term data warehouse, you’ll know that it’s a system for storing structured data for business intelligence and reporting purposes. However, as businesses have started to appreciate the value of unstructured data, such as images, videos, and voice recordings, a new type of framework called the data lake has emerged. While the data lake is a powerful and flexible infrastructure for storing unstructured data, it lacks certain critical features, such as transaction support and data quality enforcement, leading to data inconsistency.

To address these issues, a hybrid architecture was needed that could store both structured and unstructured data. This led to the development of the data lakehouse, which unifies structured and unstructured data in a single repository. Organizations that work with unstructured data can benefit from having a single data repository instead of requiring both a warehouse and a lake architecture.

Data lakehouses allow for structured and schema just like those used in a data warehouse to be applied to the unstructured data type that is typically stored in a data lake. This enables data users, such as data scientists, to access information more quickly and efficiently. Intelligent metadata layers can also be used to categorize and classify the data, enabling it to be cataloged and indexed like structured data.

Data lakehouses are particularly well-suited to inform data-driven operations and decision-making by organizations that want to move from business intelligence (BI) to artificial intelligence (AI). They are cheaper to scale than data warehouses and can be queried from anywhere using any tool, rather than being limited to applications that can only handle structured data, such as SQL.

As more organizations recognize the value of using unstructured data together with AI and machine learning, data lakehouses are becoming increasingly popular. They represent a step up in maturity from the combined data lake and data warehouse model, and they will likely become simpler, more cost-efficient, and more capable of serving diverse data applications over time.

Weekday Readings – Aug 24, 2022

  • What is Podman? The container engine replacing Docker. Podman is a container engine—a tool for developing, managing, and running containers and container images. Containers are standardized, self-contained software packages that hold all the elements necessary to run anywhere without the need for customization, including application code and supporting libraries.  (InfoWorld)
  • The strange case of Nakamoto’s bitcoin – Part 1. As Bitcoin, and cryptocurrencies in general, claim to be investments, yet have no underlying sources of revenue, many have viewed them with suspicion. Some argue that Bitcoin is a Ponzi, while others counter that the comparison is erroneous as it shares traits with a pyramid scheme. Surprisingly, despite intense scrutiny, Bitcoin has defied precise categorization as a specific form of investment fraud, leading some proponents to suggest that, as a result, it should be cleared of all charges, “If it looks like a duck, buthonks like a goose, then it can’t be either”. (Salbayat-Blog)
  • Google tried to prove managers don’t matter. Instead, it discovered 10 traits of the very best ones. The hypothesis was that the quality of a manager doesn’t matter and that managers are at best a necessary evil, and at worst a useless layer of bureaucracy. The early work of Project Oxygen, in 2002, included a radical experiment — a move to a flat organization without any managers. The experiment was a disaster, lasting only a few months as the search giant found employees were left without direction and guidance on their most basic questions and needs. (Inc.com)
  • Big Tech braces for “big lie” in 2022 midterms. Tech companies were caught flat-footed by the deluge of disinformation aimed at delegitimizing the election process and outcome in 2020. Now, amid intense regulatory scrutiny, they are trying to get ahead of a repeat.(Axios)
  • You have no idea how good mosquitoes are at smelling us. Their recent work shows that mosquitoes’ odor-detecting systems are, unlike many other animals’, patchwork, chaotic, and riddled with fail-safes that make the insects’ sense of smell extraordinarily difficult to stump. It’s an essential adaptation for a creature that is hyper-focused on us. (The Atlantic)

Weekend Readings – Aug 21,2022

  • A frustrating hassle holding electric cars back: Broken Chargers. This definitely happened to me personally. Can’t reliably identify working chargers to charge my Nissan Leaf. It surely is frustrating to drive to a remote location using the apps provided by charging networks and find that the charging station either doesn’t work, and in worst case the entire charging station is “sold” to a new charging company. (NYT)
  • Drinking the kool-Air (How billions were lost creating PDAs). Why did so many invest so much money and time into the development of PDAs?  It’s a fascinating question.  Besides the key notion of portability, computers were considered to difficult to use, which restricted the size of the market.  The PDA was meant to be as easy to use as paper but as powerful as a computer; humans would interact with the device through a pen on a small screen. Handwriting was considered a more natural way to interact especially because so few actually knew how to type.(Two Thirds Done – Blog)
  • The Mysterious Dance of the Cricket Embryos. Humans, frogs and many other widely studied animals start as a single cell that immediately divides again and again into separate cells. In crickets and most other insects, initially just the cell nucleus divides, forming many nuclei that travel throughout the shared cytoplasm and only later form cellular membranes of their own. (NYT)
  • How do I become data scientist? Our educational institutions trained us to think that’s how you learn things. It might eventually work, too — but it’s a unnecessarily inefficient process. Some programs have capstone projects (often using curated, clean data sets with a clear purpose, which sounds good but it’s not). Many recognize there’s no substitute for ‘learning on the job’ — but how do you get that data science job in the first place? (Monica Rogati)
  • The coming California Megastorm. Unlike a giant earthquake, the other “Big One” threatening California, an atmospheric river superstorm will not sneak up on the state. Forecasters can now spot incoming atmospheric rivers five days to a week in advance, though they don’t always know exactly where they’ll hit or how intense they’ll be. (NYT)

Weekend Readings: Dec 12,2021

  • Apple’s Long Journey to the M1 Pro Chip. Apple’s M1 Pro/Max is the second step in a major change in computing. What might be seen as an evolution from iPhone/ARM is really part of an Apple story that began in 1991 with PowerPC. And what a story of innovation. (learningbyshipping)
  • Is this how your brain works? Machine learning has incredible promise. I believe that in the coming decades we will produce machines that have the kind of broad, flexible “general intelligence” that would enable them to help us address truly complex, multifaceted challenges like improving medicine through a more advanced understanding of how proteins fold. Nothing we call AI today has anything like that kind of intelligence. (GatesNotes).
  • In a First, Physicists Glimpse a Quantum Ghost. A wave function is not something one can hold in their hand or put under a microscope. And confusingly, some of its properties simply seem not to be real. In fact, mathematicians would openly label them as imaginary: so-called imaginary numbers—which arise from seemingly nonsensical feats such as taking the square roots of negative integers—are an important ingredient of a wave function’s well-proved power to forecast the results of real-world experiments. In short, if a wave function can be said to “exist” at all, it does so at the hazy crossroads between metaphysical mathematics and physical reality. (Scientific American).
  • Addressing the structural foundations of homelessness in the Bay Area. The severity of the Bay Area’s homelessness crisis is visible everywhere—from the tents that crowd under freeways to the increasing number of people sleeping on sidewalks and in doorways. Largely hidden from view, however, are the 457,000 extremely low-income (ELI) households in the region who are making ends meet on an average of $18,000 a year. Over half of ELI households are precariously housed, meaning that they don’t receive any housing assistance and pay more than 30 percent of their income for housing. These households—which include seniors living on fixed incomes, single parents juggling work and child care responsibilities, and essential workers making poverty wages—are at significant risk of housing insecurity and homelessness. (Berkeley blog).
  • The futuristic plan to fix America’s power grid. One of the most important fixes would be physically “hardening” the grid, which means replacing old infrastructure that’s vulnerable to extreme weather with stronger, more resilient upgrades. These are the kinds of solutions you might notice if they pop up in your neighborhood, perhaps in the form of swapping out wooden electric poles for wind-resistant steel or concrete ones, moving power lines underground, or lifting ground-level transformers out of the path of potential floods.  (Recode)

Mid-week Readings. Dec 8, 2021

  • Everyone Is Talking About Data Science. Here’s How J.P. Morgan Is Putting It Into Practice. Paul Quinsee, J.P. Morgan Asset Management’s global head of equities, thought he knew the skills that turned analysts into stars. Like the talent scouts in Money Ball, Michael Lewis’s bestselling book on how data science changed baseball, Quinsee had been watching fundamental research analysts play their game — albeit in less dusty fields — for almost four decades. (Institutional Investor).
  • The Dark Side of 15-Minute Grocery Delivery. Over the last year, cities across the U.S. and Europe have seen a rapid rise in the number of dark stores — mini-warehouses stocked with groceries to be delivered in 15 minutes or less. Operated by well-funded startups such as GetirGopuffJokr and Gorillas, dark stores are quietly devouring retail spaces, transforming them into minimally staffed distribution centers closed to the public. In New York City, where seven of these services are currently competing for market share (including new entrant DoorDash), these companies have occupied dozens of storefronts since July, with expansion plans calling for hundreds more in that city alone. (Bloomberg)
  • Can Apple Take Down the World’s Most Notorious Spyware Company? If Apple were to win this case, it would deal a strong blow against malicious spyware operators, state-sponsored hacking, and the global oppression of democracy activists. However, if defendants were to somehow prevail, it could send a signal that we have entered a new age in which technological pirates are free to run amok without fear of judicial intervention (Slate)
  • Why you should care about Facebook’s big push into the metaverse. Many critics and skeptics have mocked Zuckerberg’s plan to change Facebook from a social media company to a metaverse company. Some critics say that by focusing on the metaverse and renaming itself while the company is reeling from a PR crisis, Facebook is distracting from the problems it creates or contributes to in the real world: issues like harming teens’ mental health, facilitating the spread of disinformation, and fueling political polarization. (Vox)

Weekend Readings – Nov 21, 2021

  • Microsoft and Metaverse: It’s certainly the question of the season: what is the Metaverse? Here is the punchline: the Metaverse already exists, it just happens to be called the Internet.  For well over a year a huge portion of people’s lives was primarily digital. The primary way to connect with friends and family was via video calls or social networking; the primary means of entertainment was streaming or gaming; for white collar workers their jobs were online as well. This certainly wasn’t ideal: the first thing people want to do as the world opens up is see their friends and family in person, go to a movie or see a football game as a collective, or take a trip. Work, though, has been a bit slower to come back: even if the office is open, many meetings are still online given that some of the team may be working remote — for many companies, permanently.(Stratechery)
  • Even though electric and self-driving cars have yet to saturate the market, dozens of companies are at various stages of launching flying cars in a variety of models. Although the earlier prototypes were not successful, they have paved the way for today’s more advanced models. With the more recent development and popularity of drones, several companies have designed passenger models. These include two Chinese companies, XPeng, which is backed by the e-commerce company Alibaba, and EHang, which is supplying the United Arab Emirates with autonomous taxis. The drones run on electric motors. (Quillette)
  • On the wings of Ada Lovelace.She detailed her ideas for a “flying machine” in the spring of 1828, writing to her mother: “I have got a scheme … which, if ever I effect it … is to make a thing in the form of a horse with a steam engine in the inside so contrived as to move an immense pair of wings, fixed on the outside of the horse, in such a manner as to carry it up into the air while a person sits on its back” (April 7, 1828, excerpted in Ada, the Enchantress of Numbers). (UC Berkeley Blog)
  • How well can an AI mimic human ethics? For a long time, a background assumption in many parts of the AI field was that to build intelligence, researchers would have to explicitly build in reasoning capacity and conceptual frameworks the AI could use to think about the world. Early AI language generators, for example, were hand-programmed with principles of syntax they could use to generate sentences. Now, it’s less obvious that researchers will have to build in reasoning to get reasoning out. It might be that an extremely straightforward approach like training AIs to predict what a person on Mechanical Turk would say in response to a prompt could get you quite powerful systems. (Vox)
  • What Is Customer Satisfaction Score (CSAT)? A CSAT score is easy to calculate. It’s the sum of all positive responses, divided by the total responses collected, then multiplied by 100. The outcome leaves you with the overall percentage of satisfied customers at your business. A big strength of Customer Satisfaction Score lies in its simplicity: It’s an easy way to close the loop on a customer interaction and determine whether or not it was effective in producing happiness. (Hubspot)

Early Week Readings – Nov 15,2021

  • Apple’s Relentless Strategy, Execution, and Point of View. Apple’s announcement of “Apple Silicon” is important for many reasons. Delivering on such an undertaking is the result of remarkable product engineering.(Learning By Shipping)
  • Qualcomm is researching machine learning at the edge. So far, ML at the edge has only involved inference, the process of running incoming data against an existing model to see if it matches. Training the algorithm still takes place in the cloud. But Qualcomm has been researching ways to make the training of ML algorithms at the edge less energy-intensive, which means it could happen at the edge. (StacyOnIoT)
  • Wi-Fi HaLow is now certified and ready for action. About five and half years ago the Wi-Fi Alliance announced that it was planning a new Wi-Fi standard just for the IoT. It was dubbed Wi-Fi HaLow and the IEEE standard was called 802.11ah. The whole goal of the new standard was to tackle the high power consumption of traditional Wi-Fi and to have the signals stretch over longer ranges.(StacyOnIot)
  • Tableau Pledges to Train 10 Million Data People. With demand for data skills outpacing supply, data skills are no longer exclusively essential for data scientists or technical roles — to build truly data-driven organizations, employees across the entire enterprise must be data literate. This will help companies become data-driven and strengthen the Tableau Economy – a rapidly growing ecosystem of businesses, tech partners and people leading the world’s data transformations. (Tableau Blog)
  • Why We Forgive Humans More Readily Than Machines. Today, much of that moralizing is not aimed at the wrong pair of socks but at AI and at those who create it. Often the outrage is justified. AI has been involved in wrongful arrestsbiased recidivism scores, and multiple scandals involving misclassified photos or gender-stereotypical translations. And for the most part, the AI community has listened. Today, AI researchers are well aware of these problems and are actively working to fix them. (Scientific American)
  • A Half Century Later, the Journey of Apollo 8 Still Inspires. It’s hard to believe Apollo 8’s voyage around the moon had originally been scheduled as a less audacious Earth-orbit mission to test the whole moonship “flotilla”: the monstrous, still problem-prone Saturn 5 booster, along with the recently redesigned, and only once-flown-by-astronauts Apollo command ship, which was fashioned to carry a three-person crew to and from Earth and into moon orbit. For a landing, it was to fly in tandem with a lunar lander that would ferry two astronauts to and from the moon’s surface. (Scientific American)