Welcome, Guest |
Welcome to Makestation! We are a creative arts/indie discussion community — Your center for creative arts discussion, unleashed!
Please note that you must log in to participate in discussions on the forum. If you do not have an account, you can create one here. We hope you enjoy the forum!
|
|
the forum is being flooded with spammers!
|
|
the forum is being flooded with spammers!
|
|
the forum is being flooded with spammers!
|
|
the forum is being flooded with spammers!
|
|
the forum is being flooded with spammers!
|
View all updates
|
Online Users |
There are currently 370 online users. » 0 Member(s) | 369 Guest(s) Google
|
|
|
Your iPhone probably outperforms your computer. |
Posted by: Darth-Apple - June 5th, 2019 at 11:43 PM - Forum: Technology & Hardware
- Replies (2)
|
|
Five years ago, phones were considerably restricted in their performance compared to their desktop counterparts. It used to be the case that raw computing power belonged to more capable devices, ones that could cool off high-power intel CPUs with larger fans and larger batteries.
Today, this is not so much the case. Apple’s custom CPU designs are so powerful that they can literally outperform virtually any dual-core intel processor made to date. That is with turbo boost, significantly higher TDPs, full fan-cooled designs, hyper threading, and of course much higher clock speeds. The phone in your pocket can outperform almost all of them without a hitch, and that’s with a 2.5 GHZ CPU with a ~3 watt TDP.
Apple has done a fantastic job of designing these new processing chips, and one of the latest rumors is that they will be designing these chips for Mac computers in the near future. Although this would break compatibility with current x86 compiled code, this would still be revolutionary because these chips would not only be more powerful, but they’d be much lower-powered as well, allowing for significant battery gains.
But how is this done? How can a tiny, fanless cell-phone CPU outperform a beauty Intel dual core counterpart? Well, the secret lies in a number of factors, but one of them is in the term RISC, which stands for reduced instruction set computer. Modern X86 based processors are known as CISC, or complex instruction set computers. These processors have thousands of instructions. There is very deep circuitry involved in decoding them, and once they are decoded, they are actually turned into micro ops (or uops for short), which comprise a much simpler execution structure that is actually sent to the processors execution units. All of this decoding requires power and circuity. Intel processors actually decode their CISC instructions into simpler instructions on the fly, which somewhat resemble RISC instructions once they are decoded and executed.
The difference is that RISC processors have code that is already compiled in this simpler format. x86 CPUs require heavily logical circuitry to decode these instructions real-time.
The origins of the Apple CPU.
Apple CPUs have always been ARM based rather than x86 based. ARM is an entirely different architecture, with radically different design principles. Interestingly enough, ARM, as a company, does not actually manufacture its chips. They merely design them, and allow individual manufacturers to license them and to manufacture (or modify) them as they please. Because of this, there is actually quite a large degree of variety in the ARM CPU market. ARM CPUs are common. Virtually every cell phone and tablet, and most cromebooks and a few laptops are powered by them. Despite their variety, they have a strictly defined instruction set, and one ARM CPU is binary-compatible with another.
However, ARM CPUs are designed for extremely low power consumption and for high efficiency, which differs principally from x86 based designs. They are designed not necessarily for performance, but for efficiency. There are not many complex instructions that must be decoded into simpler instructions. Instead, the entire binary that gets sent to the processor has already been decoded into these simpler instructions by the compiler. There are generally more of these instructions, making binaries larger, but they are significantly easier to decode in the processor, thus saving time and power in the process.
This, of course, is an incredibly simplistic explanation for just one of the reasons why ARM CPUs are much lower in their power consumption. In reality, there are a variety of reasons for this, many of which go beyond the scope of this post. Of course, the comparison between CISC and RISC goes beyond the scope of this article in many other ways as well, as there are actually advantages to both. For example, because CISC processors can accomplish more in fewer instructions, there is often an advantage of less memory traffic to worry about. ARM CPUs require more cache to mitigate some of this.
However, as evidenced by the benchmarks performed on Apple’s implementations of the ARM architecture, ARM does have a lot of potential, and quite a bit of room to grow. And Apple’s A11 and A12 are the first implementations of the ARM architecture that have been able to outperform Intel CPUs in a significant capacity.
Why are these CPUs so fast?
Well, Apple hasn’t published a whole lot of information regarding the specifics of their architecture. Little is known about many of the deeper workings of the processors, but we do know a few things that may point to why these processors excel in benchmarks.
- These processors have a massive 8MB L2 cache on-chip.
- These processors are incredibly wide architectures. So wide that no desktop CPU is even in the same class.
- Due to their wide architecture, these processors can decode 7 instructions in a single clock cycle, and dispatch them to any of 13 execution units. For comparison, intel’s modern processors (6th generation and newer) can decode 5 instructions every cycle, and send them to 8 different execution units.
This last point is likely one of the major reasons that this chip is so fast. It’s an incredibly wide architecture.
What is a “wide architecture”? How does a chip dispatch seven instructions at once?
Modern processors notably do not execute code in order. Consider this:
Code:
A = 1
B = 2
X = A + B
C =5
D = X + C
F = 1
Your processor will execute these instructions in an entirely different order than they are written here. The processor will execute both A and B in parallel, by two different execution units, running on the same clock cycle. There are generally several execution units available, including a few ALUs for integer arithmetic, at least one floating point unit, as well as additional units for address calculation, and for loading and storing data. There is usually more than one of each given type of execution unit, but not all execution units can process all types of instructions. The processor has a dedicated scheduling unit, whose job is specifically to determine which execution units are free, and to send instructions to the appropriate units for the job.
The processor will hold until the results for these are completed before executing line 3, because line 3 depends on the previous two lines. However, a very wide processor would have room to execute lines 4 and 6 before line 3 even begins. If the data is not complete, it will continue to look ahead in the code for other commands that do not depend on preceding ones. Wide architectures are capable of executing several instructions at once, so long as there are instructions in the code that do not have unresolved dependencies.
This concept is called out of order execution. And rest assured, it’s an extremely complex concept that involves quite a bit of complex circuitry to implement. Virtually every modern processor, even relatively low end cell phones (with the exception of the bottom end Cortex A53 based devices) implement out of order execution. This feature has been in existence since the original pentium processors.
If a processor were to complete these instructions in order, the total throughput would be dramatically reduced.
The disadvantage with out of order execution is that many instructions are highly sequential in nature. For example:
Code: A = someMath
B = A + 1
C = sqrt(B)
D = C - 3
E = D
E = D + 1
If (E = 5):
A = 1
Else:
A = 5
There is not a single instruction in this code that can be executed in parallel. These instructions are entirely sequential, one after the other.
However, as discussed in the previous post about modern processor architectures, branch prediction will come into play in this example. The processor can look ahead and see a branch statement, which is an if/else block. A unit known as a branch predictor will attempt to determine which outcome is more likely, and will speculatively execute this outcome.
How do you keep results concurrent when they are processed in seemingly scrambled order?
How this is done in modern processors is highly complicated. The CPU usually keeps a table of previous branches taken, and will keep track of the direction that previous branches have taken. If for example, the code runs in a loop and the branch above has been taken many times before, it will be safely assumed that it will be taken again. Modern processors keep track of this at lightning speed, and can reach 95% accuracy in many cases. Processors are so accurate at these predicting branches that many of the performance impacts of longer pipelines are significantly mitigated by the branch predictor unit.
However, on a random new branch where the code has not been seen before, it does not have any means of being able to see which branch will have been taken more often in the past. A modern processor uses what is known as static branch prediction in these cases, and usually determines that a branch is more often not taken that it is to be taken. It will almost always jump to the else block, as a result, and execute these results in parallel while the preceding instructions are being completed. If it gets to the end of the preceding instructions and it guessed correctly, it keeps the results it previously calculated. If it miscalculated and it guessed incorrectly, it throws away the results, flushes the pipeline, and starts over with the correct result.
Compilers are almost always somewhat aware of the algorithms that a processor uses to predict a branch. If a compiler can determine that a branch is not likely to be taken, it encodes it in such a way when it compiles that the processor will not speculatively mispredict the branch. In other words, the compiler attempts to look at your code and determine which outcome is more likely to occur. Once this is done, it encodes the resulting binary in such an order that the processor will correctly guess the more likely outcome.
Whoa mate, we have a problem.
Suppose that the actual code execution sequence gets reorganized, and looks like this:
Quote:A = someMath
If (E = 5):
...A = 1
Else:
A = 5
B = A + 1
C = sqrt(B)
D = C - 3
E = D
E = D + 1
There is a slight problem with this reorganization of the code.
When it executes the branch statement, the processor speculatively initializes A to 1 earlier than it is supposed to. Of course we don’t know for sure if this is the value we want yet, but it speculatively executes it and checks it later. However, for all of the instructions that are executed after the branch statement is speculatively run, these statements depend on A being set to someMath, and we’ve speculatively set A to 1 instead. Not only do we have to check and see if the branch prediction was accurate after this code is executed, but we now have to throw away all of the code’s results because every single instruction that follows it is now incorrect. It depended on a different version of A that was not supposed to be reinitialized.
As a result, regardless of whether our initial branch prediction was correct or not, all of the results following it have to be thrown out regardless. This is a waste of time if there ever was one. This is actually a huge problem in out of order execution.
This is solved, in part, with register renaming.
What the hell? Register renaming? What?
In modern CPU architectures, there are actually multiple physical registers that correspond to any given register that can be used for instructions. In other words, if you have register A available, the processor has multiple copies of this register, despite only one of these registers being logically visible to the program. The compiler can only see one of these.
To the programmer, these duplicated registers do not exist. The CPU only has a single register for each designated name that is available to the compiler, and thus, to the programmer.
The benefit of this, however, is to allow the CPU to keep track of multiple versions of the register. In the code example above, two registers will both be designated as A, which each register carrying a different version of A. Thus, when the if/else block is speculatively executed, it is stored in a separate physical register, which only overwrites the original one once the code that precedes it has been executed in proper, logical order. In other words, the CPU has a different version of A, and knows which version of A to use depending on where it is in the code execution process.
Modern CPUs actually carry anywhere from 128 to 200+ instructions in a buffer before they are executed, and this buffer serves as a set of instructions that the CPU can reorder instructions from. In the code above, we only have a few lines of code. In modern CPUs, the architecture can look ahead for instructions that are on the order of hundreds of instructions away. In general, they look for instructions that aren’t dependent on code preceding it to be completed first.
And what we ultimately have is a super powerful processor that is capable of dispatching multiple instructions in parallel, from seemingly random parts of the program. The term for this is a superscalar architecture, which is better defined in more detail on this page.
If a cell phone processor can outperform intel’s offerings, we’ve accomplished something pretty astounding.
This Reddit page offers quite a bit of additional information if you’re interested.
|
|
|
Pentium 4 Prescott - the worst CPU ever made |
Posted by: Darth-Apple - June 5th, 2019 at 1:50 AM - Forum: Technology & Hardware
- Replies (1)
|
|
Most of us probably remember the "blazing fast" pentium 4 CPUs that debuted in the early 2000s. By today's standards, they would barely handle any modern system if at all. Even on a single core, modern CPUs are several orders of magnitude faster. The pentium 4 was especially controversial because its architecture was heavily refactored from earlier generations (Pentium 3 and below) for exactly one purpose: to achieve the highest clock speed possible, largely for marketing purposes. This was actually done at a considerable performance penalty on a clock per clock basis. As a result, at the same clock speed, a Pentium 3 would considerably outperform a pentium 4. They were only faster because they could clock to 3GHZ and beyond, where the original Pentium 3 could only clock up to ~1GHZ or so.
This architecture was, as history has shown, released during a time when Intel was having quite a bit of trouble determining where they were going, and they were still figuring out the process of building a good processor. At the time, P4 based systems were fast, but they have not stood the test of time. Today, the pentium 4 is known as perhaps the biggest architectural flop in the company's history. It had enormous power consumption, heat generation, and was very slow on a clock-for-clock basis compared to every single architecture that came after it, and most of the architectures that preceded it. However, because it was the first processor from Intel to go far beyond the 1GHZ barrier and to go to 3 GHZ and beyond, it was still an important piece of Intel’s processor history.
TL;DR: The pentium 4 was fast because it ran on uncharted territory and explored high clock speeds for the first time in intel's history. However, it was an architectural nightmare, was highly inefficient, and created an unholy amount of heat with unrealistic power consumption. It was an incredibly poor design, and was completely scrapped when Intel came out with the Core 2 Duo series, basing these new processors on the Pentium 3 architecture instead.
The original Pentium 4 was better than the revised versions that followed it.
Usually, when you revise a processor, the new one is supposed to be better than the one it is supposed to replace. With Pentium 4’s, this was not the case. The original Pentium 4 Northwood (edit: willamette, then northwood) cores were still considered decent for their time. They were fast, could reach high clock speeds, and were quickly a hit for the time. Despite having a worse clock-for-clock performance over a pentium 3, the higher clock speeds quickly made up for the difference.
A few years later, intel created a new version of the architecture revision known as Prescott to replace Northwood in all Pentium 4 CPUs. In the history of Intel processors, this is known as one of the worst revisions that the company has ever made. It was so bad that these Prescott CPUs actually generated more heat, required more power, and literally performed worse than the Northwood CPUs they were designed to replace. The goal was to clock the Prescott cores higher than their Northwood counterparts to offset this balance. However, the Prescott cores were so inefficient compared to the Northwood cores that it was highly impractical to raise the clock speeds high enough to offset this performance difference. Almost all Prescott cores, even when clocked higher than the old cores, were considerably slower than the old Pentium 4’s. Often the performance difference was 15% or more.
Why would a newer core be slower?
The reason for this was primarily because of a term that is somewhat elusive in processor architectures, and that is pipelining. It may seem that a longer pipeline is better, but this is usually not the case. The pentium 3 CPUs had a pipeline of about 10 stages. The original Northwood Pentium 4's had a pipeline of about 20 stages. The Prescott Pentium 4 chips had a pipeline of about 31. Modern intel CPUs, for comparison, have pipelines of about 14-19 stages. Much better. In general, the longer the pipeline is, the slower the processor is. However, longer pipelines are often used to increase clock speeds, and shorter pipelines can't as easily achieve this.
But what is a pipeline?
It turns out that CPU’s do not actually complete a single instruction in a single clock cycle. It takes many cycles to fully complete execution of an instruction. The instruction has to be loaded from cache (clock cycles), decoded (usually a few cycles), executed (at least one, sometimes several cycles), and written back to the registers and memory. The reason some of these operations take several cycles is because of complex gate circuitry in the processor. It takes time for a transistor to flip states from one state to another, and if you have very deep logic circuitry in a stage, a processor won't be able to reach high clock speeds because each stage will take too long to complete.
A pipeline splits these instruction logic processes into a series of much simpler steps that can be completed much more quickly. Each step represents one clock cycle, and with smaller steps, the processor can reach much higher clock speeds. A pipeline loads many instructions at once in an assembly line fashion. As a result, even though a single instruction may take ~20 cycles to execute, the processor is also working on 20 instructions at any given time, so the effective result is that one instruction is still done every clock cycle.
[url=https://en.wikipedia.org/wiki/Pipeline_(computing)]Wikipedia[/url] has a much better explanation for this than I could provide here.
So why are longer pipelines slower if they are constantly being fed with instructions anyway?
The problem is the unruly branch statement. Consider the following code for a second here:
Code: value = getPasswordFromUser();
if (value=password):
log_user_in();
else:
ask_for_password_again();
This is a branch statement. It states that if a certain value matches some criteria, the code is supposed to do this. If not, it will do something else instead. These types of branching statements are very, very common. So common that a pipeline with a 20 stage length will, on average, have a couple of them sitting in it at any given time.
The issue is that you don't actually know what value is going to be until you get through the entire pipeline and execute the instruction before it. Because of this, even though the instructions for the if statement are sitting behind this command in the pipeline, the processor has no idea which code to actually execute. It has to wait for the entire pipeline to be cleared so that it knows what value is going to be equal to. And once it's done with this, it has wasted 20 cycles.
This is why the Prescott cores were so bad. They had to waste 31 cycles every time there was an instruction like this because of their longer pipeline. And if there was an instruction like this every 5-10 cycles, then the processor would execute 5-10 instructions, then stall for 31 cycles.
If so many cycles are wasted on every branch statement, how is this mitigated?
Modern processors actually mitigate this through a concept known as branch prediction. There is quite a bit of complex circuitry involved in doing this, but essentially, when the processor initially sees that it has an if or else statement, it will try to guess which one will more likely be taken, and speculatively execute that without actually knowing until the proceeding instructions exit the pipeline. If the processor's guess was correct, it keeps executing. Otherwise, it throws away the results and starts over with the correct branch. As a result, it only wastes these cycles if it can't guess correctly, significantly mitigating this. Modern processors are actually extremely good at this.
The pentium 4 era CPUs were terrible, by comparison. They performed branch prediction, but not well. They generally guessed a branch would never be taken and would immediately jump to the else block, and they would generally do the same thing no matter what code was fed into it. As you can imagine, this was fairly inefficient and resulted in a lot of very long pipelines being flushed.
This is one of the reasons modern processors are so much faster, despite often running at lower actual clock speeds. Intel and AMD have done a fantastic job integrating complicated circuitry to allow the processor to be properly utilized to its full capacity. Unfortunately, the pentium 4 was a flop on the way to this goal, and modern Core based processors were actually based on the Pentium 3 architecture and completely scrapped the work that went into the Pentium 4. The P4 was so bad that Intel literally threw it away and started over, and they based the new design on the very architecture that the P4 was supposed to replace.
But nevertheless, it was design mistakes such as these that were a part of modern computing history, and allowed Intel to learn how to build the powerful processors they have today. Rest assured, they are a true marvel of engineering.
|
|
|
Sim Copter Finally patched for modern systems! |
Posted by: SpookyZalost - June 1st, 2019 at 1:23 AM - Forum: Other Games
- Replies (6)
|
|
So Great news guys!
I don't know if anyone remembers or played sim copter back in the day, but someone finally made it work on modern systems by literally creating their own patch!
http://simcopter.net/index.html
also they plan to update streets of simcity as well, this is super exciting, I loved both of these games back in the day and I'm super stoked to be able to play them again without them crashing on anything but my Pentium III!
|
|
|
the soft doctrines of Imaginos: Mythology behind the Blue Öyster Cult |
Posted by: SpookyZalost - May 26th, 2019 at 12:18 AM - Forum: Creative Writing
- No Replies
|
|
Hey so has anyone heard or looked into the history of the Blue Öyster Cult's music and the story behind their strange esoteric lyrics?
apparently it was all either taken from or inspired by an unpublished work by Sandy Pearlman who in turn went on to become the manager of said band.
it's all really interesting stuff with the original being inspired by the works of HP lovecraft, various religions, and history as a whole.
I mean listen to this piece from their 1988 album imaginos
it's really actually kinda fascinating the story it tells when you really start listening to their work and finding the plot threads.
as near as I can tell it's a story involving aliens, psychic energy, a supernaturally empowered cult, and the world wars which were caused by a kind of psychic backlash leftover from medieval/colonial Europe.
|
|
|
2019 NBA Playoffs |
Posted by: brian51 - May 24th, 2019 at 9:16 AM - Forum: Sports
- Replies (15)
|
|
Big win for Toronto last night in an away game.
This now puts them up 3-2 in there series with the Milwaukee Bucks,
in the east conference finals
|
|
|
Brand new Mac stolen... |
Posted by: Darth-Apple - May 23rd, 2019 at 4:00 PM - Forum: Technology & Hardware
- Replies (7)
|
|
So y'all, I have some relatively bad news?
I bought a brand new Macbook air (the 2018 retina model) about three weeks ago. The thing was honestly, quite frankly, an awesome piece of technology. The battery life was insanely good. It wasn't the fastest machine in the world (1.6 ghz dual core, although the turbo would go up to ~3ghz for a few seconds at a time while you were opening programs or booting).
Anyway, I was downtown with some people last week, and it got stolen out of my car. I suppose I forgot to lock it, but either way, it's nowhere to be found. Find my iPhone has been no good whatsoever, and it hasn't been connected to the internet since I lost it. I've locked it, and nobody will be able to do anything to it without my password that I locked it with.
I've also filed a police report, and if someone tries to pawn it, it's gonna turn up.
Wish me luck getting this thing back guys. Hopefully I'll get it returned to me sooner rather than later, but if not, I suppose I'm out ~1,000.
On the bright side, at least I have my 2012, and this thing is still a champ. Still works great, and will probably last several more years before it's completely obsolete.
|
|
|
... |
Posted by: theexplorer - May 18th, 2019 at 3:31 AM - Forum: General Discussion
- Replies (3)
|
|
i've always kind of believed that everyone makes their own way in life, when it comes to things like beliefs, friends, personalities.. just the way you are and if something happens and you have a hard time than its your own fault and that you should change something.
i realize now that this is just what society wants you to feel. when everyone around you kicks you down and leaves you to suffer it's your own fault and that you're a bad person for it. maybe you just need to think more positively and get your mind off the bad stuff, and maybe you even deserve it because you're such a shitty person.
it's just to make you feel like the corrupt one. it's constructed by all the winners in order to take advantage of the losers, then dispose of them when they are no longer useful to them. that's how it all works. if you're a loser, you're destined to a life of pain and suffering at the hands of the winners, who don't feel a thing and perhaps even feel good about using and abusing the losers. even though the losers are in their position due to no fault of their own, they're still destined to this kind of life.
that's why I don't trust anyone anymore. after all this bullshit, i'm done putting my trust in any single human being at all. it's not worth it to me anymore, including the people who i once thought were my friends. everyone would just be perfectly content to see my suffering, along with those others who they have no problems with beating down and kicking aside.
if anyone actually reads this they will likely think i'm crazy but we all are in our own ways. i just felt like venting to some people that i trusted at one time. i know it wont change anything but thanks for reading this anyway.
by the way this can be deleted whenever, just felt the need to post it.
|
|
|
Chatbots |
Posted by: SpookyZalost - May 17th, 2019 at 1:28 AM - Forum: Technology & Hardware
- Replies (6)
|
|
Anyone remember chatbots back in the day?
the early DIY attempts at creating programs which could communicate with people, in some cases an effort to beat the turring test.
ALICE, and it's variants.
and of course the more... questionable one's designed for the more... adult of us.
anyone still using them or see them used more recently?
|
|
|
Video Course Opinions? |
Posted by: Lain - May 16th, 2019 at 2:44 AM - Forum: Software
- Replies (6)
|
|
I've been feeling really confident with my programming skills over the last few weeks since I've been able to actually pump out a decent amount of projects that actually work and have some degree of optimization. As such, I've considered making my own little video tutorial series on programming (specifically C++, for a number of reasons.) If anything, the series will be put up on Youtube since I'm pretty new with content creation and don't want to monetize immediately with Udemy or something. Knowledge should be free.
So I'm asking you guys, have you ever tried to learn from a video series? What was your biggest pet peeve with it? Why did you stop (if you stopped) or what kept you coming back to learn from that one series of tutorials?
Right now, I want to start doing a focus on adding some sort of 'homework,' like small projects that I'll explain at the end of a video then go over a solution in the next video or something like that. Since I'm a bit mic/camera shy, I've also noticed my videos take pretty long since 1. I'm giving some pretty detailed explanations and 2. I'm speaking a bit slowly from shyness.
What else would you personally like to see? (Not that I'm gonna spam advertise my shit here, but just so I know what the audience is looking for rather than simply going for it without a regard for the viewers.)
|
|
|
|